If the number of your requests exceeds the capacity allocated to process
requests, then error code 429 is returned. The following table displays the
error message generated by each type of quota framework:
| Quota framework | Message |
|---|---|
| Pay-as-you-go | Resource exhausted, please try again later. |
| Provisioned Throughput | Too many requests. Exceeded the Provisioned Throughput. |
With a Provisioned Throughput (PT) subscription, you can reserve an
amount of throughput for specific generative AI models. If you don't have a
PT subscription and resources aren't available
to your application, then an error code 429 is returned. Although you don't
have reserved capacity, you can try your request again. However, the request
isn't counted against your error rate as described in your service level
agreement (SLA).
For projects that have purchased PT, Gemini Enterprise Agent Platform measures a project's throughput and reserves the purchased amount of throughput for the project's actual usage.
For standard PT, when you use less than your
purchased amount, errors that might otherwise be 429 are returned as 5XX and
count toward the SLA error rate. For Single Zone PT,
when you use less than your purchased amount, capacity-related 429 errors are
treated as 5XX but don't count toward the SLA error rate. When you exceed your
purchased amount, the additional requests are processed on-demand as pay-as-you-go.
Pay-as-you-go
On the pay-as-you-go quota framework, you have the following options for
resolving 429 errors:
- Use the global endpoint instead of a regional endpoint whenever possible.
- Implement a retry strategy by using truncated exponential backoff.
- If your model uses quotas, you can submit a Quota Increase Request (QIR). If your model uses Standard pay-as-you-go, smoothing traffic and reducing large spikes can help.
- Subscribe to PT for a more consistent level of service. For more information, see PT.
PT
To correct the 429 error generated by PT, do the following:
- Use the Default behavior example, which doesn't set a header in prediction requests. Any overages are processed on-demand and billed as pay-as-you-go.
- Increase the number of GSUs in your PT subscription.
What's next
- To learn more about Standard pay-as-you-go, see Standard pay-as-you-go.
- To learn more about PT, see Provisioned Throughput.
- To learn about quotas and limits for Agent Platform, see Agent Platform quotas and limits.
- To learn more about Google Cloud quotas and system limits, see the Cloud Quotas documentation.