Host AI agents on Cloud Run

This page highlights use cases for hosting AI agents on Cloud Run.

AI agents are autonomous software entities that use LLM-powered systems to perceive, decide, and act to achieve goals. As more autonomous agents are built, their ability to communicate and collaborate becomes crucial.

For an introduction to AI agents, see What is an AI agent.

Host AI Agents on Cloud Run

You can implement AI agents as Cloud Run services to orchestrate a set of asynchronous tasks and provide information through multiple request-response interactions.

A Cloud Run service is a scalable API endpoint for your application's core logic. It efficiently manages multiple concurrent users through automatic, on-demand, and rapid scaling of instances.

AI agent on Cloud Run architecture

A typical AI agent architecture deployed on Cloud Run can involve several components from Google Cloud and outside of Google Cloud:

The four components of AI agent hosted on Cloud Run.
Figure 1. Architecture of an AI agent on Cloud Run.

The diagram shows the following:

  • Hosting platform: Cloud Run is a hosting platform for running agents and offers the following benefits:

    • Supports running any agent framework to build different types of agents and agentic architectures. Examples of agent frameworks include Agent Development Kit (ADK) and LangGraph.
    • Provides built-in features for managing your agent. For example, Cloud Run provides a built-in service identity that you can use as the agent identity for calling Google Cloud APIs with secure and automatic credentials.
    • Supports connecting your agent framework to other services. You can connect your agent to first-party or third-party tools deployed on Cloud Run. For example, to gain visibility into your agent's tasks and executions, you can deploy and use tools like Langfuse and Arize.
  • Agent interactions: Cloud Run supports streaming HTTP responses back to the user, and WebSockets for real-time interactions.

  • GenAI models: The orchestration layer calls models for reasoning capabilities. These can be:

  • Memory: Agents often need memory to retain context and learn from past interactions. You can use the following services:

    • Memorystore for Redis for short-term memory.
    • Firestore for long-term memory, such as storing the conversational history or remembering the user's preferences.
  • Vector database: For Retrieval-Augmented Generation (RAG) or fetching structured data, use a vector database to query specific entity information or perform a vector search over embeddings. Use the pgvector extension with the following services:

  • Tools: The orchestrator uses tools to perform specific tasks to interact with external services, APIs, or websites. This can include:

    • Using an MCP server: External or internal tools that are executed through an MCP server.
    • Basic utilities: Precise math calculations, time conversions, or other similar utilities.
    • API calling: Make calls to other internal or third-party APIs (read or write access).
    • Image or chart generation: Quickly and effectively create visual content.
    • Browser and OS automation: Run a headless or a full graphical Operating System within container instances to allow the agent to browse the web, extract information from websites, or perform actions using clicks and keyboard input.
    • Code execution: Execute code in a secure environment with multi-layered sandboxing, with minimal or no IAM permissions.

What's next