Throughput quota

Gemini Enterprise Agent Platform provides different ways to manage throughput for generative AI models to help you balance cost and performance. This document describes the available options: a flexible pay-as-you-go model and reserved capacity for predictable throughput.

Managed model quotas

Agent Platform offers two ways to manage throughput for the managed generative AI models on Gemini Enterprise Agent Platform, which lets you balance cost, flexibility, and performance. You can either use a pay-as-you-go model or reserve a dedicated amount of throughput for a fixed price.

Pay-as-you-go

For the default pay-as-you-go model, Agent Platform uses Standard pay-as-you-go (Standard PayGo). PayGo lets you pay only for the resources that you consume, without requiring upfront financial commitments. There are additional PayGo options that vary in cost and performance. For more information, see Priority PayGo or Flex PayGo.

Reserved Capacity

For critical production applications that require consistent performance and predictable costs, you can use Provisioned Throughput. Provisioned Throughput is a fixed-cost subscription that reserves a specific amount of throughput for your models in a chosen location.

Quotas for Generative AI services

Gemini Enterprise Agent Platform offers a suite of generative AI services, such as model tuning, model evaluation, batch prediction, embeddings, and retrieval augmented generation. To learn more about the quotas for these services, see Generative AI on Gemini Enterprise Agent Platform quotas and system limits.

What's next

Learn more about Standard PayGo.
Learn more about Provisioned Throughput.
Learn more about generative AI quotas and system limits.
Learn more about Google Cloud quotas.

Throughput quota Stay organized with collections Save and categorize content based on your preferences.