Google Kubernetes Engine (GKE) Agent Sandbox helps you manage isolated, stateful, and single-replica workloads on GKE. It is optimized for use cases like AI agent runtimes, where untrusted, LLM-generated code must be executed in a secure and performant environment.
The GKE Agent Sandbox add-on is based on the open-source Agent Sandbox controller project and follows its release cycles. As a managed GKE add-on, Google manages the full lifecycle of the controller, including automatic upgrades and security patches.
This document provides a conceptual overview of GKE Agent Sandbox.
Why use GKE Agent Sandbox
GKE Agent Sandbox is built for agentic workloads that require high-level scale, extensibility, and security. Key benefits include:
- Kernel-level isolation: Provides strong, kernel-level isolation for untrusted, LLM-generated code using technologies like gVisor.
- Sub-second provisioning: Offers an out-of-the-box mechanism to provide sandboxes significantly faster than standard Kubernetes Pod scheduling allows (typically <1s).
- Cloud-native extensibility: Leverages the power of the Kubernetes paradigm and the managed infrastructure of GKE.
By providing a declarative, standardized API, GKE Agent Sandbox offers a single-container experience that provides isolation and persistence characteristics similar to a virtual machine (VM), built entirely on Kubernetes primitives.
Common use cases for Agent Sandbox
Use GKE Agent Sandbox for workloads that require isolation, persistence, and a stable identity. Example use cases include:
- AI agent runtimes: Safely execute untrusted code in an environment isolated by security-focused runtimes like gVisor.
- Development environments: Provide developers with persistent, isolated cloud-based coding environments.
- Notebooks and research tools: Host single-container sessions for interactive tools like Jupyter Notebooks.
- Stateful single-Pod services: Run applications that need a stable
identity and storage without the complexity of a
StatefulSet. - Programmatic environment management: Use provided client library SDKS, such as the Agent Sandbox Python SDK, to request and manage sandboxes directly from your application logic without managing Kubernetes YAML.
How GKE Agent Sandbox works
GKE Agent Sandbox uses a custom controller and several Kubernetes Custom Resource Definitions (CRDs) to manage the lifecycle of sandboxed environments.
Core architecture
- Sandbox CRD: The primary resource that represents a single, stateful Pod. It manages stable hostnames, network identity, and persistent storage.
- Sandbox Router: A component that provides a stable endpoint and tunnels traffic to the appropriate Sandbox Pods, abstracting the underlying networking complexity.
- Integration with Pod snapshots: GKE Agent Sandbox integrates with the GKE Pod snapshots feature to allow pausing and resuming workloads by saving and restoring the full state of a container.
Claim Model
The Claim Model is a key feature that separates the user's request for an
environment from the specific implementation details, such as where and how the
workload is provisioned. Unlike a standard Kubernetes StatefulSet, the Claim
Model lets you request a sandbox without needing to manage the underlying
Pod or storage configurations directly.
The Claim Model is managed using the
SandboxClaim and
SandboxTemplate CRDs, and works as follows:
- Users or applications request a Sandbox by creating a
SandboxClaimthat references aSandboxTemplate. - The controller handles the mapping of the claim to an actual Sandbox instance, offering flexible backend management. This allows the system to reuse existing Sandboxes or allocate from a pool.
Warm Pools
The Warm Pool feature is designed to minimize startup latency, which is critical for interactive
AI agent scenarios. This feature allows the Agent Sandbox to provide execution environments in less than one second, significantly faster than typical Pod scheduling. The feature is managed using the
SandboxWarmPool
CRD and works in the following way:
- A
SandboxWarmPoolmaintains a set of pre-warmed Pod instances in a ready state. - When a
SandboxClaimis made, the controller instantly assigns a Pod from the pool instead of waiting for a new Pod to pull images and start from scratch. - When combined with Pod snapshots, warm pools provide fast, "instant-on" capabilities by restoring Pods from a pre-configured state.
Network isolation
GKE Agent Sandbox implements a Default Deny network security posture for all
sandboxed environments. This ensures that untrusted code executed inside a
sandbox cannot access unauthorized internal networks or the GKE
control plane by default. You can define specific network restrictions and
allowed egress or ingress rules within your SandboxTemplate to provide
fine-grained security for agentic workloads.
Programmatic access with SDKs
AI engineers can consume GKE Agent Sandbox resources
programmatically using provided client libraries. For example, the Python
SDK provides a high-level interface that abstracts the
underlying SandboxClaim and SandboxTemplate configurations. This lets you create and interact with isolated environments directly from your Python-based
agent frameworks like LangChain or the Vertex AI Agentic SDK.
Limitations and requirements
GKE Agent Sandbox has the following limitations and requirements:
- Cluster version: Requires GKE version 1.30.2-gke.1394000 or later for full feature support (including snapshots).
- Infrastructure requirements: Optimized for specific node configurations (such as N2 machine types) and requires the Agent Sandbox controller to be installed and configured on the cluster.
- Isolation runtimes: While it supports multiple runtimes, it is primarily intended to be used with security-hardened runtimes like gVisor.
- Underlying features availability: Some underlying features, such as GKE Pod snapshots, might be in Preview or have specific regional availability.
What's next
- Learn how to enable Agent Sandbox on GKE.
- Learn more about isolating AI code execution with Agent Sandbox.
- To learn how to use Pod snapshots with Agent Sandbox, see Save and restore Agent Sandbox environments with Pod snapshots.
- For the underlying open-source implementation, see the Agent Sandbox GitHub project.
- For example runtimes and YAML configurations for scenarios such as code execution or computer use, see the Agent Sandbox examples.
- To interact with sandboxes programmatically, see the Agent Sandbox Python SDK README on GitHub.