Gemini 3.1 Flash-Lite is our most cost-efficient Gemini model, optimized for low latency use cases for high-volume, cost-sensitive LLM traffic. It provides a significant quality increase over Gemini 2.0 Flash-Lite and Gemini 2.5 Flash-Lite models, matching Gemini 2.5 Flash performance across key capability areas:
- Improved response quality: Aims to match 2.5 Flash performance.
- Improved instruction following: Targeted improvements to serve as a reliable migration path for complex chatbot and instruction-heavy workflows.
- Improved audio input: Improved audio-input quality for tasks like Automated Speech Recognition (ASR).
- Expanded thinking support: You can control how much reasoning the model performs by choosing from minimal, low, medium, or high thinking levels. This feature lets you balance response quality and speed for your specific use case.
Try in Agent Studio Deploy example app
| Model ID | gemini-3.1-flash-lite |
|
|---|---|---|
| Modalities |
|
|
| Token limits | Context window | 1,048,576 |
| Maximum output tokens | 65,535 (default) | |
| Capabilities |
|
|
| Consumption options |
|
|
| See Consumption options for more information. | ||
| Technical specifications | Image |
|
| Text |
|
|
| Video |
|
|
| Audio |
|
|
| Parameter defaults |
|
|
| Supported regions |
Model availability |
|
|
ML processing |
|
|
| See Deployments and endpoints for more information. | ||
| Knowledge cutoff date | January 2025 | |
| Versions |
|
|
| Security controls | Online prediction |
|
| Batch inference |
|
|
| Context caching |
|
|
| See Security controls for more information. | ||
| Pricing | See Pricing. | |