This page highlights use cases for hosting AI agents on Cloud Run.
AI agents are autonomous software entities that use LLM-powered systems to perceive, decide, and act to achieve goals. As more autonomous agents are built, their ability to communicate and collaborate becomes crucial.
For an introduction to AI agents, see What is an AI agent.
Host AI Agents on Cloud Run
You can implement AI agents as Cloud Run services to orchestrate a set of asynchronous tasks and provide information through multiple request-response interactions.
A Cloud Run service is a scalable API endpoint for your application's core logic. It efficiently manages multiple concurrent users through automatic, on-demand, and rapid scaling of instances.
AI agent on Cloud Run architecture
A typical AI agent architecture deployed on Cloud Run can involve several components from Google Cloud and outside of Google Cloud:
The diagram shows the following:
Hosting platform: Cloud Run is a hosting platform for running agents and offers the following benefits:
- Supports running any agent framework to build different types of agents and agentic architectures. Examples of agent frameworks include Agent Development Kit (ADK) and LangGraph.
- Provides built-in features for managing your agent. For example, Cloud Run provides a built-in service identity that you can use as the agent identity for calling Google Cloud APIs with secure and automatic credentials.
- Supports connecting your agent framework to other services. You can connect your agent to first-party or third-party tools deployed on Cloud Run. For example, to gain visibility into your agent's tasks and executions, you can deploy and use tools like Langfuse and Arize.
Agent interactions: Cloud Run supports streaming HTTP responses back to the user, and WebSockets for real-time interactions.
GenAI models: The orchestration layer calls models for reasoning capabilities. These can be:
- The Gemini API
- Custom models or other foundation models deployed on Vertex AI endpoints
- Your own fine-tuned models served from a separate GPU-enabled-Cloud Run service
Memory: Agents often need memory to retain context and learn from past interactions. You can use the following services:
- Memorystore for Redis for short-term memory.
- Firestore for long-term memory, such as storing the conversational history or remembering the user's preferences.
Vector database: For Retrieval-Augmented Generation (RAG) or fetching structured data, use a vector database to query specific entity information or perform a vector search over embeddings. Use the
pgvectorextension with the following services:Tools: The orchestrator uses tools to perform specific tasks to interact with external services, APIs, or websites. This can include:
- Using an MCP server: External or internal tools that are executed through an MCP server.
- Basic utilities: Precise math calculations, time conversions, or other similar utilities.
- API calling: Make calls to other internal or third-party APIs (read or write access).
- Image or chart generation: Quickly and effectively create visual content.
- Browser and OS automation: Run a headless or a full graphical Operating System within container instances to allow the agent to browse the web, extract information from websites, or perform actions using clicks and keyboard input.
- Code execution: Execute code in a secure environment with multi-layered sandboxing, with minimal or no IAM permissions.
What's next
- Watch Build AI agents on Cloud Run.
- Try the codelab for learning how to build and deploy a LangChain app to Cloud Run.
- Learn how to deploy Agent Development Kit (ADK) to Cloud Run.
- Try the codelab for using an MCP server on Cloud Run with an ADK agent.
- Try the codelab for deploying your ADK agent to Cloud Run with GPU.
- Find ready-to-use agent samples in Agent Development Kit (ADK) samples.
- Host Model Context Protocol (MCP) servers on Cloud Run.