This page describes best practices for retrying failed requests to the Model Armor API.
For requests that are safe to retry, we recommend using truncated exponential backoff with introduced jitter.
Overview of truncated exponential backoff
Each request to the Model Armor API can succeed or fail. If your application retries failed requests without waiting, it might send a large number of retries to Model Armor in a short period of time. As a result, you might exceed quotas and limits that apply to every Model Armor resource in your Google Cloud project.
To avoid triggering this issue, we strongly recommend that you use truncated exponential backoff with introduced jitter, which is a standard error-handling strategy for network applications. In this approach, a client periodically retries a failed request with exponentially increasing delays between retries. A small, random delay, known as jitter, is also added between retries. This random delay helps prevent a synchronized wave of retries from multiple clients, also known as the thundering herd problem.
Exponential backoff algorithm
The following algorithm implements truncated exponential backoff with jitter:
- Send a request to Model Armor.
-
If the request fails, wait 1 +
random-fractionseconds, then retry the request. -
If the request fails, wait 2 +
random-fractionseconds, then retry the request. -
If the request fails, wait 4 +
random-fractionseconds, then retry the request. -
Continue this pattern, waiting 2n +
random-fractionseconds after each retry, up to amaximum-backofftime. -
After
deadlineseconds, stop retrying the request.
Use the following values as you implement the algorithm:
-
Before each retry, the wait time is
min((2n + random-fraction), maximum-backoff), withnstarting at 0 and incremented by 1 for each retry. -
Replace
random-fractionwith a random fractional value less than or equal to 1. Use a different value for each retry. Adding this random value prevents clients from becoming synchronized and sending large numbers of retries at the same time. -
Replace
maximum-backoffwith the maximum amount of time, in seconds, to wait between retries. Typical values are 32 or 64 (25 or 26) seconds. Choose the value that works best for your use case. -
Replace
deadlinewith the maximum number of seconds to keep sending retries. Choose a value that reflects your use case. For example, in a continuous integration/continuous deployment (CI/CD) pipeline that is not highly time-sensitive, you might setdeadlineto 300 seconds (5 minutes).
Types of errors to retry
Use this retry strategy for all requests to the Model Armor API that
return the error codes 500, 502, 503, or 504.
Optionally, you can use this retry strategy for requests to the
Model Armor API that return the error code 429.