Gemini 2.5 Flash-Lite is our most balanced Gemini model, optimized for low latency use cases. It comes with the same capabilities that make other Gemini 2.5 models helpful, such as the ability to turn thinking on at different budgets, connecting to tools like Grounding with Google Search and code execution, multimodal input, and a 1 million-token context length.
2.5 Flash-Lite
Try in Vertex AI (Preview) Deploy example app
| Model ID | gemini-2.5-flash-lite |
|
|---|---|---|
| Supported inputs & outputs |
|
|
| Token limits |
|
|
| Capabilities | ||
| Consumption options |
|
|
| See Consumption options for more information. | ||
| Input size limit | 500 MB | |
| Technical specifications | ||
| Images |
|
|
| Documents |
|
|
| Video |
|
|
| Audio |
|
|
| Parameter defaults |
|
|
| Supported regions | ||
|
Model availability |
|
|
| See Deployments and endpoints for more information. | ||
| Knowledge cutoff date | January 2025 | |
| Versions |
|
|
| Security controls | ||
| Online prediction |
|
|
| Batch prediction |
|
|
| Tuning |
|
|
| Context caching |
|
|
| RAG Engine |
|
|
| Grounding with Google Search and Grounding with Google Maps |
|
|
| See Security controls for more information. | ||
| Supported languages | See Supported languages. | |
| Pricing | See Pricing. | |
2.5 Flash-Lite
Try in Vertex AI (Preview) Deploy example app
| Model ID | gemini-2.5-flash-lite-preview-09-2025 |
|
|---|---|---|
| Supported inputs & outputs |
|
|
| Token limits |
|
|
| Capabilities | ||
| Consumption options |
|
|
| See Consumption options for more information. | ||
| Technical specifications | ||
| Images |
|
|
| Documents |
|
|
| Video |
|
|
| Audio |
|
|
| Parameter defaults |
|
|
| Supported regions | ||
|
Model availability |
|
|
| See Deployments and endpoints for more information. | ||
| Knowledge cutoff date | January 2025 | |
| Versions |
|
|
| Supported languages | See Supported languages. | |
| Pricing | See Pricing. | |