Quotas for Anthropic Claude models

This page describes how Gemini Enterprise Agent Platform manages quotas for Anthropic Claude models.

Overview

Anthropic Claude models on Agent Platform use one of the following quota systems:

Shared lineage quotas

Global and multi-regional endpoints for Anthropic Claude models launched after May 26, 2026 use shared model lineage quotas. A single quota limit is shared across all model versions in a model lineage for a given location.

For example, if you call Claude Opus 4.8 through the global endpoint, then all requests, input tokens, and output tokens count against the same anthropic-claude-opus-4 quota bucket for the global endpoint, regardless of the specific Opus version that you call.

Quotas on the global endpoint and each multi-region endpoint are independent buckets. Usage on the global endpoint doesn't consume quota on a multi-region endpoint, and similarly usage on a multi-region endpoint doesn't consume quota on the global endpoint.

The following sections describe how shared lineage quotas work for global endpoints and multi-region endpoints.

Global endpoints

The following table describes metrics for global endpoints:

Metric Quota dimension
global_online_prediction_requests_per_base_model base_model:
  • anthropic-claude-opus-4
  • anthropic-claude-sonnet
  • anthropic-claude-haiku
  • anthropic-claude-mythos
global_online_prediction_input_tokens_per_minute_per_base_model base_model:
  • anthropic-claude-opus-4
  • anthropic-claude-sonnet
  • anthropic-claude-haiku
  • anthropic-claude-mythos
global_online_prediction_output_tokens_per_minute_per_base_model base_model:
  • anthropic-claude-opus-4
  • anthropic-claude-sonnet
  • anthropic-claude-haiku
  • anthropic-claude-mythos

Multi-region endpoints

The following table describes the metrics for multi-region endpoints, such as us and eu:

Metric Quota dimension
LOCATION_multi_region_online_prediction_requests_per_base_model base_model:
  • anthropic-claude-opus-4
  • anthropic-claude-sonnet
  • anthropic-claude-haiku
  • anthropic-claude-mythos
LOCATION_multi_region_online_prediction_input_tokens_per_minute_per_base_model base_model:
  • anthropic-claude-opus-4
  • anthropic-claude-sonnet
  • anthropic-claude-haiku
  • anthropic-claude-mythos
LOCATION_multi_region_online_prediction_output_tokens_per_minute_per_base_model base_model:
  • anthropic-claude-opus-4
  • anthropic-claude-sonnet
  • anthropic-claude-haiku
  • anthropic-claude-mythos

Implications for your application

  • Adding a new version of a lineage doesn't require a new quota request. When a new version of a model lineage, such as a future Sonnet release, launches on a public endpoint, it shares the existing model lineage quota bucket. You don't need to file a separate quota increase to use the new version.

  • Mixing versions consumes the same quota bucket. Traffic split across multiple versions of the same model lineage draws from one shared quota. Plan capacity at the lineage level, not the version level.

View and manage your quotas

To view your current usage limits or request quota increases, go to the Quotas and system limits page in the Google Cloud console.

To filter quotas, use the model lineage value of base_model, such as anthropic-claude-opus-4.

Per-model quotas

Anthropic Claude models launched before May 26, 2026 have quotas based on the type of endpoint used: regional, multi-region, or global. The quota is specified in queries per minute (QPM) and tokens per minute (TPM). TPM includes both input and output tokens.

To maintain overall service performance and acceptable use, the maximum quotas might vary by account and, in some cases, access might be restricted. View your project's quotas on the Quotas & Systems Limits page in the Google Cloud console. You must also have the following quotas available:

  • Queries Per Minute (QPM):

    • For regional endpoints: online_prediction_requests_per_base_model
    • For the global endpoint: global_online_prediction_requests_per_base_model
    • For the US multi-region endpoint: us_multi_region_online_prediction_requests_per_base_model
    • For the EU multi-region endpoint: eu_multi_region_online_prediction_requests_per_base_model
  • Tokens Per Minute (TPM):

    • Some models count input and output tokens together:
      • Regional: online_prediction_tokens_per_minute_per_base_model
      • Global: global_online_prediction_tokens_per_minute_per_base_model
    • Other models count input and output tokens separately:
      • Input TPM:
        • Regional: online_prediction_input_tokens_per_minute_per_base_model
        • Global: global_online_prediction_input_tokens_per_minute_per_base_model
        • US Multi-Region: us_multi_region_online_prediction_input_tokens_per_minute_per_base_model
        • EU Multi-Region: eu_multi_region_online_prediction_input_tokens_per_minute_per_base_model
      • Output TPM:
        • Regional: online_prediction_output_tokens_per_minute_per_base_model
        • Global: global_online_prediction_output_tokens_per_minute_per_base_model
        • US Multi-Region: us_multi_region_online_prediction_output_tokens_per_minute_per_base_model
        • EU Multi-Region: eu_multi_region_online_prediction_output_tokens_per_minute_per_base_model

    To see which models count input and output tokens separately, see Quotas by model and region.

Input tokens

The following list defines the input tokens that can count towards your input TPM quota. The input tokens that each model counts can vary. To see which input tokens a model counts, see Quotas by model and region.

  • Input tokens includes all input tokens, including cache read and cache write tokens.
  • Uncached input tokens includes only the input tokens that weren't read from a cache (cache read tokens).
  • Cache write tokens includes tokens that were used to create or update a cache.

Quotas by model and region

The following table shows the default quotas and supported context length for each model in each region.

Model Region Quotas Context length
Claude Opus 4.8 Multi-region 1,000,000
Multi-region 1,000,000
global endpoint 1,000,000
Claude Opus 4.7 Multi-region 1,000,000
Multi-region 1,000,000
global endpoint 1,000,000
Claude Opus 4.6 us-east5 1,000,000
europe-west1 1,000,000
asia-southeast1 1,000,000
global endpoint 1,000,000
Claude Sonnet 4.6 us-east5 1,000,000
europe-west1 1,000,000
asia-southeast1 1,000,000
global endpoint 1,000,000
Claude Opus 4.5 us-east5 200,000
europe-west1 200,000
asia-southeast1 200,000
global endpoint 200,000
Claude Opus 4.1 us-east5 200,000
global endpoint 200,000
Claude Opus 4 us-east5 200,000
global endpoint 200,000
Claude Sonnet 4.5 us-east5 1,000,000 (beta), 200,000 (GA)
europe-west1 1,000,000 (beta), 200,000 (GA)
asia-southeast1 1,000,000 (beta), 200,000 (GA)
global endpoint 1,000,000 (beta), 200,000 (GA)
Claude Sonnet 4 us-east5 1,000,000
europe-west1 1,000,000
global endpoint 1,000,000
Claude 3.7 Sonnet us-east5
  • QPM: 55
  • TPM: 500,000 (uncached input and output)
200,000
europe-west1
  • QPM: 40
  • TPM: 300,000 (uncached input and output)
200,000
global endpoint
  • QPM: 35
  • TPM: 300,000 (uncached input and output)
200,000
Claude 3.5 Sonnet v2 us-east5
  • QPM: 90
  • TPM: 540,000 (input and output)
200,000
europe-west1
  • QPM: 55
  • TPM: 330,000 (input and output)
200,000
global endpoint
  • QPM: 25
  • TPM: 140,000 (input and output)
200,000
Claude Haiku 4.5 us-east5 200,000
europe-west1 200,000
asia-east1 200,000
global endpoint 200,000
Claude 3.5 Haiku us-east5
  • QPM: 80
  • TPM: 350,000 (input and output)
200,000
europe-west1
  • QPM: 90
  • TPM: 400,000 (input and output)
200,000
Claude 3.5 Sonnet us-east5
  • QPM: 80
  • TPM: 350,000 (input and output)
200,000
europe-west1
  • QPM: 130
  • TPM: 600,000 (input and output)
200,000
asia-southeast1
  • QPM: 35
  • TPM: 150,000 (input and output)
200,000
Claude 3 Opus us-east5
  • QPM: 20
  • TPM: 105,000 (input and output)
200,000
Claude 3 Haiku us-east5
  • QPM: 245
  • TPM: 600,000 (input and output)
200,000
europe-west1
  • QPM: 75
  • TPM: 181,000 (input and output)
200,000
asia-southeast1
  • QPM: 70
  • TPM: 174,000 (input and output)
200,000

If you want to increase any of your quotas for Agent Platform, you can use the Google Cloud console to request a quota increase. To learn more about quotas, see the Cloud Quotas overview.