This page describes how Gemini Enterprise Agent Platform manages quotas for Anthropic Claude models.
Overview
Anthropic Claude models on Agent Platform use one of the following quota systems:
- Models launched after May 26, 2026: Shared lineage quotas.
- Models launched before May 26, 2026: Per-model quotas.
Shared lineage quotas
Global and multi-regional endpoints for Anthropic Claude models launched after May 26, 2026 use shared model lineage quotas. A single quota limit is shared across all model versions in a model lineage for a given location.
For example, if you call Claude Opus 4.8 through the global endpoint,
then all requests, input tokens, and output tokens count against the same
anthropic-claude-opus-4 quota bucket for the global endpoint, regardless of
the specific Opus version that you call.
Quotas on the global endpoint and each multi-region endpoint are independent buckets. Usage on the global endpoint doesn't consume quota on a multi-region endpoint, and similarly usage on a multi-region endpoint doesn't consume quota on the global endpoint.
The following sections describe how shared lineage quotas work for global endpoints and multi-region endpoints.
Global endpoints
The following table describes metrics for global endpoints:
| Metric | Quota dimension |
|---|---|
global_online_prediction_requests_per_base_model |
base_model:
|
global_online_prediction_input_tokens_per_minute_per_base_model
|
base_model:
|
global_online_prediction_output_tokens_per_minute_per_base_model
|
base_model:
|
Multi-region endpoints
The following table describes the metrics for multi-region endpoints, such as
us and eu:
| Metric | Quota dimension |
|---|---|
LOCATION_multi_region_online_prediction_requests_per_base_model
|
base_model:
|
LOCATION_multi_region_online_prediction_input_tokens_per_minute_per_base_model
|
base_model:
|
LOCATION_multi_region_online_prediction_output_tokens_per_minute_per_base_model
|
base_model:
|
Implications for your application
Adding a new version of a lineage doesn't require a new quota request. When a new version of a model lineage, such as a future Sonnet release, launches on a public endpoint, it shares the existing model lineage quota bucket. You don't need to file a separate quota increase to use the new version.
Mixing versions consumes the same quota bucket. Traffic split across multiple versions of the same model lineage draws from one shared quota. Plan capacity at the lineage level, not the version level.
View and manage your quotas
To view your current usage limits or request quota increases, go to the Quotas and system limits page in the Google Cloud console.
To filter quotas, use the model lineage value of base_model, such as
anthropic-claude-opus-4.
Per-model quotas
Anthropic Claude models launched before May 26, 2026 have quotas based on the type of endpoint used: regional, multi-region, or global. The quota is specified in queries per minute (QPM) and tokens per minute (TPM). TPM includes both input and output tokens.
To maintain overall service performance and acceptable use, the maximum quotas might vary by account and, in some cases, access might be restricted. View your project's quotas on the Quotas & Systems Limits page in the Google Cloud console. You must also have the following quotas available:
Queries Per Minute (QPM):
- For regional endpoints:
online_prediction_requests_per_base_model - For the global endpoint:
global_online_prediction_requests_per_base_model - For the US multi-region endpoint:
us_multi_region_online_prediction_requests_per_base_model - For the EU multi-region endpoint:
eu_multi_region_online_prediction_requests_per_base_model
- For regional endpoints:
Tokens Per Minute (TPM):
- Some models count input and output tokens together:
- Regional:
online_prediction_tokens_per_minute_per_base_model - Global:
global_online_prediction_tokens_per_minute_per_base_model
- Regional:
- Other models count input and output tokens separately:
- Input TPM:
- Regional:
online_prediction_input_tokens_per_minute_per_base_model - Global:
global_online_prediction_input_tokens_per_minute_per_base_model - US Multi-Region:
us_multi_region_online_prediction_input_tokens_per_minute_per_base_model - EU Multi-Region:
eu_multi_region_online_prediction_input_tokens_per_minute_per_base_model
- Regional:
- Output TPM:
- Regional:
online_prediction_output_tokens_per_minute_per_base_model - Global:
global_online_prediction_output_tokens_per_minute_per_base_model - US Multi-Region:
us_multi_region_online_prediction_output_tokens_per_minute_per_base_model - EU Multi-Region:
eu_multi_region_online_prediction_output_tokens_per_minute_per_base_model
- Regional:
- Input TPM:
To see which models count input and output tokens separately, see Quotas by model and region.
- Some models count input and output tokens together:
Input tokens
The following list defines the input tokens that can count towards your input TPM quota. The input tokens that each model counts can vary. To see which input tokens a model counts, see Quotas by model and region.
- Input tokens includes all input tokens, including cache read and cache write tokens.
- Uncached input tokens includes only the input tokens that weren't read from a cache (cache read tokens).
- Cache write tokens includes tokens that were used to create or update a cache.
Quotas by model and region
The following table shows the default quotas and supported context length for each model in each region.
| Model | Region | Quotas | Context length |
|---|---|---|---|
| Claude Opus 4.8 | Multi-region |
|
1,000,000 |
Multi-region |
|
1,000,000 | |
global endpoint |
|
1,000,000 | |
| Claude Opus 4.7 | Multi-region |
|
1,000,000 |
Multi-region |
|
1,000,000 | |
global endpoint |
|
1,000,000 | |
| Claude Opus 4.6 | us-east5 |
|
1,000,000 |
europe-west1 |
|
1,000,000 | |
asia-southeast1 |
|
1,000,000 | |
global endpoint |
|
1,000,000 | |
| Claude Sonnet 4.6 | us-east5 |
|
1,000,000 |
europe-west1 |
|
1,000,000 | |
asia-southeast1 |
|
1,000,000 | |
global endpoint |
|
1,000,000 | |
| Claude Opus 4.5 | us-east5 |
|
200,000 |
europe-west1 |
|
200,000 | |
asia-southeast1 |
|
200,000 | |
global endpoint |
|
200,000 | |
| Claude Opus 4.1 | us-east5 |
|
200,000 |
global endpoint |
|
200,000 | |
| Claude Opus 4 | us-east5 |
|
200,000 |
global endpoint |
|
200,000 | |
| Claude Sonnet 4.5 | us-east5 |
|
1,000,000 (beta), 200,000 (GA) |
europe-west1 |
|
1,000,000 (beta), 200,000 (GA) | |
asia-southeast1 |
|
1,000,000 (beta), 200,000 (GA) | |
global endpoint |
|
1,000,000 (beta), 200,000 (GA) | |
| Claude Sonnet 4 | us-east5 |
|
1,000,000 |
europe-west1 |
|
1,000,000 | |
global endpoint |
|
1,000,000 | |
| Claude 3.7 Sonnet | us-east5 |
|
200,000 |
europe-west1 |
|
200,000 | |
global endpoint |
|
200,000 | |
| Claude 3.5 Sonnet v2 | us-east5 |
|
200,000 |
europe-west1 |
|
200,000 | |
global endpoint |
|
200,000 | |
| Claude Haiku 4.5 | us-east5 |
|
200,000 |
europe-west1 |
|
200,000 | |
asia-east1 |
|
200,000 | |
global endpoint |
|
200,000 | |
| Claude 3.5 Haiku | us-east5 |
|
200,000 |
europe-west1 |
|
200,000 | |
| Claude 3.5 Sonnet | us-east5 |
|
200,000 |
europe-west1 |
|
200,000 | |
asia-southeast1 |
|
200,000 | |
| Claude 3 Opus | us-east5 |
|
200,000 |
| Claude 3 Haiku | us-east5 |
|
200,000 |
europe-west1 |
|
200,000 | |
asia-southeast1 |
|
200,000 |
If you want to increase any of your quotas for Agent Platform, you can use the Google Cloud console to request a quota increase. To learn more about quotas, see the Cloud Quotas overview.