This page applies to Apigee and Apigee hybrid.
View
Apigee Edge documentation.
Use the comparison chart below to help you decide which policy to use for your rate-limiting use case:
| Quota | SpikeArrest | LLMTokenQuota | PromptTokenLimit | |
|---|---|---|---|---|
| Use it to: | Limit the number of API proxy calls a developer app or developer can make over a specific period of time. It's best for rate limiting over longer time intervals like days, weeks, or months, especially when accurate counting is a requirement. | Limit the number of API calls that can be made against an API proxy across all consumers over a short period of time, such as seconds or minutes. | Manage and limit the total token consumption for LLM API calls over a specified period (minute, hour, day, week, or month). This allows you to control LLM expenditures and apply granular quota management based on API products. | Protect your API proxy's target backend against token abuse, massive prompts, and potential denial-of-service attempts by limiting the rate of tokens sent in the input by throttling requests based on the number of tokens in the user's prompt message. It is a comparative paradigm to Spike Arrest for API traffic, but for tokens. |
| Don't use it to: | Protect your API proxy's target backend against traffic spikes. Use SpikeArrest or PromptTokenLimit for that. | Count and limit the number of connections apps can make to your API proxy's target backend over a specific period of time, especially when accurate counting is required. | Protect your API proxy's target backend against token abuse. Use PromptTokenLimit for that. | Accurately count and limit the total number of tokens consumed for billing or long-term quota management. Use the LLMTokenQuota policy for that. |
| Stores a count? | Yes | No | Yes, it maintains counters that track the number of tokens consumed by LLM responses. | It counts tokens to enforce a rate limit but does not store a persistent, long-term count like the LLMTokenQuota policy. |
| Best practices for attaching the policy: |
Attach it to the ProxyEndpoint Request PreFlow, generally after the authentication of the user. This enables the policy to check the quota counter at the entry point of your API proxy. |
Attach it to the ProxyEndpoint Request PreFlow, generally at the very beginning of the flow. This provides spike protection at the entry point of your API proxy. |
Apply the enforcement policy (EnforceOnly) in the
request flow and the counting policy
(CountOnly) in the response flow. For
streaming responses, attach the counting policy to an
EventFlow.
|
Attach it to the ProxyEndpoint Request PreFlow, at the beginning of the flow, to protect your backend from oversized prompts. |
| HTTP status code when limit has been reached: | 429 (Service Unavailable) |
429 (Service Unavailable) |
429 (Service Unavailable) |
429 (Service Unavailable) |
| Good to know: |
|
|
|
|
| Get more details: | Quota policy | SpikeArrest policy | LLMTokenQuota policy | PromptTokenLimit policy |
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-12-23 UTC.