This document lists the quotas and system limits that apply to Gemini Enterprise.
- Quotas have default values, but you can typically request adjustments.
- System limits are fixed values that can't be changed.
Google Cloud uses quotas to help ensure fairness and reduce spikes in resource use and availability. A quota restricts how much of a Google Cloud resource your Google Cloud project can use. Quotas apply to a range of resource types, including hardware, software, and network components. For example, quotas can restrict the number of API calls to a service, the number of load balancers used concurrently by your project, or the number of projects that you can create. Quotas protect the community of Google Cloud users by preventing the overloading of services. Quotas also help you to manage your own Google Cloud resources.
The Cloud Quotas system does the following:
- Monitors your consumption of Google Cloud products and services
- Restricts your consumption of those resources
- Provides a way to request changes to the quota value and automate quota adjustments
In most cases, when you attempt to consume more of a resource than its quota allows, the system blocks access to the resource, and the task that you're trying to perform fails.
Quotas generally apply at the Google Cloud project level. Your use of a resource in one project doesn't affect your available quota in another project. Within a Google Cloud project, quotas are shared across all applications and IP addresses.
For more information, see the Cloud Quotas overview.
There are also system limits on Gemini Enterprise resources. System limits can't be changed.
For information about overage pricing for Gemini Enterprise, see Quotas and overages.
Allocation quotas
The following table lists the quotas for Discovery Engine API. These quotas don't reset over time. Instead, they're released when you release the resource. You can request a quota increase if the default quota isn't enough.
| Quota | Value |
|---|---|
| Total number of data stores per project | 100* |
| Total number of engines per project | 150† |
| Regional number of data stores per project per location (Global, US, EU) | 100 |
| Regional number of documents per project per location (Global, US, EU) | 10,000,000 |
| Regional number of engines per project per location (Global, US, EU) | 150 |
* Due to a technical limitation, the maximum quota for data stores is 500 per project. If you need more data stores, use new projects.
† Due to a technical limitation, the maximum quota for engines is 500 per project. If you need more engines, use new projects.
Rate quotas
The following quotas apply to Discovery Engine API requests. If the default quota isn't enough, you can request a quota increase.
| Quota | Value |
|---|---|
| Complete query requests per minute per project | 300 |
| Regional search requests per minute per project per location (Global, US, EU) | 300 |
Request a quota increase
To adjust most quotas, use the Google Cloud console. For more information, see Request a quota adjustment.