This page applies to Apigee and Apigee hybrid.
View
Apigee Edge documentation.
Use the comparison chart below to help you decide which policy to use for your rate-limiting use case:
| Quota | SpikeArrest | LLMTokenQuota | PromptTokenLimit | |
|---|---|---|---|---|
| Use it to: | Limit the number of API proxy calls a developer app or developer can make over a specific period of time. It's best for rate limiting over longer time intervals like days, weeks, or months, especially when accurate counting is a requirement. | Limit the number of API calls that can be made against an API proxy across all consumers over a short period of time, such as seconds or minutes. | Manage and limit the total token consumption for LLM API calls over a specified period (minute, hour, day, week, or month). This allows you to control LLM expenditures and apply granular quota management based on API products. | Protect your API proxy's target backend against token abuse, massive prompts, and potential denial-of-service attempts by limiting the rate of tokens sent in the input by throttling requests based on the number of tokens in the user's prompt message. It is a comparative paradigm to Spike Arrest for API traffic, but for tokens. |
| Don't use it to: | Protect your API proxy's target backend against traffic spikes. Use SpikeArrest or PromptTokenLimit for that. | Count and limit the number of connections apps can make to your API proxy's target backend over a specific period of time, especially when accurate counting is required. | Protect your API proxy's target backend against token abuse. Use PromptTokenLimit for that. | Accurately count and limit the total number of tokens consumed for billing or long-term quota management. Use the LLMTokenQuota policy for that. |
| Stores a count? | Yes | No | Yes, it maintains counters that track the number of tokens consumed by LLM responses. | It counts tokens to enforce a rate limit but does not store a persistent, long-term count like the LLMTokenQuota policy. |
| Best practices for attaching the policy: |
Attach it to the ProxyEndpoint Request PreFlow, generally after the authentication of the user. This enables the policy to check the quota counter at the entry point of your API proxy. |
Attach it to the ProxyEndpoint Request PreFlow, generally at the very beginning of the flow. This provides spike protection at the entry point of your API proxy. If you use both SpikeArrest and Quota policies in the same proxy, SpikeArrest should always be attached before the Quota policy in the ProxyEndpoint Request PreFlow. SpikeArrest acts as a first line of defense against sudden traffic bursts, smoothing traffic before requests are evaluated against longer-term Quota limits. This prevents spikes from prematurely exhausting quota. |
Apply the enforcement policy ( |
Attach it to the ProxyEndpoint Request PreFlow, at the beginning of the flow, to protect your backend from oversized prompts. If you use both PromptTokenLimit and LLMTokenQuota policies in the same proxy, PromptTokenLimit should always be attached before the LLMTokenQuota policy in the ProxyEndpoint Request PreFlow. PromptTokenLimit acts as a first line of defense against oversized prompts, rejecting them before requests are evaluated against longer-term LLMTokenQuota limits. This prevents oversized prompts from prematurely exhausting token quota. |
| HTTP status code when limit has been reached: | 429 (Too Many Requests) |
429 (Too Many Requests) |
429 (Too Many Requests) |
429 (Too Many Requests) |
| Good to know: |
|
|
|
|
| Get more details: | Quota policy | SpikeArrest policy | LLMTokenQuota policy | PromptTokenLimit policy |
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2026-02-21 UTC.