Gemini 3.1 Flash-Lite

Important: The preview version of Gemini 3.1 Flash-Lite (gemini-3.1-flash-lite-preview) will be discontinued on July 9, 2026. Update your application to use gemini-3.1-flash-lite instead.

Gemini 3.1 Flash-Lite is our most cost-efficient Gemini model, optimized for low latency use cases for high-volume, cost-sensitive LLM traffic. It provides a significant quality increase over Gemini 2.0 Flash-Lite and Gemini 2.5 Flash-Lite models, matching Gemini 2.5 Flash performance across key capability areas:

Improved response quality: Aims to match 2.5 Flash performance.
Improved instruction following: Targeted improvements to serve as a reliable migration path for complex chatbot and instruction-heavy workflows.
Improved audio input: Improved audio-input quality for tasks like Automated Speech Recognition (ASR).
Expanded thinking support: You can control how much reasoning the model performs by choosing from minimal, low, medium, or high thinking levels. This feature lets you balance response quality and speed for your specific use case.

Try in Agent Studio Deploy example app

Note: "Deploy example app" requires a Google Cloud project with billing and Agent Platform API enabled.

Model ID	`gemini-3.1-flash-lite`
Modalities	Text Input and output Image Input only Audio Input only Video Input only
Token limits	Context window	1,048,576
Token limits	Maximum output tokens	65,535 (default)
Capabilities	Thinking Supported System instructions Supported Gemini Live API Not supported Structured output Supported Context caching Implicit context caching, explicit context caching Supported Count Tokens Supported RAG Engine Not supported Chat completions Supported Tuning Supervised fine-tuning, continuous tuning, tuning checkpoints Supported URL context Supported
Tools	Grounding Google Search, Parallel Web Search, Exa Web Search Supported Code execution Supported Function calling Supported Computer Use Preview feature Not supported
Consumption options	Provisioned Throughput Supported Batch inference Supported Pay-as-you-go Flex PayGo, Priority PayGo Supported Fixed quota Not supported
Consumption options	See Consumption options for more information.
Technical specifications	Image	Maximum images per prompt: 3,000 Maximum file size per file for inline data or direct uploads through the console: 7 MB Maximum file size per file from Google Cloud Storage: 30 MB Supported MIME types: `image/png`, `image/jpeg`, `image/webp`, `image/heic`, `image/heif`
	Text	Maximum number of files per prompt: 3,000 Maximum number of pages per file: 3,000 Maximum file size per file for the API or Cloud Storage imports: 50 MB(application/pdf) or 7 MB(text/plain) Maximum file size per file for direct uploads through the console: 7 MB Supported MIME types: `application/pdf`, `text/plain`
	Video	Maximum video length (with audio): Approximately 45 minutes Maximum video length (without audio): Approximately 1 hour Maximum number of videos per prompt: 10 Supported MIME types: `video/x-flv`, `video/quicktime`, `video/mpeg`, `video/mpegs`, `video/mpg`, `video/mp4`, `video/webm`, `video/wmv`, `video/3gpp`
	Audio	Maximum audio length per prompt: Approximately 8.4 hours, or up to 1 million tokens Maximum number of audio files per prompt: 1 Supported MIME types: `audio/x-aac`, `audio/flac`, `audio/mp3`, `audio/m4a`, `audio/mpeg`, `audio/mpga`, `audio/mp4`, `audio/ogg`, `audio/pcm`, `audio/wav`, `audio/webm`
	Parameter defaults	Temperature: 0.0-2.0 (default 1.0) topP: 0.0-1.0 (default 0.95) topK: 64 (fixed) candidateCount: 1–8 (default 1)
Supported regions	Model availability	Global global Multi-region (See connection guide) us eu
	ML processing	United States Multi-region Europe Multi-region
	See Deployments and endpoints for more information.
Knowledge cutoff date	January 2025
Versions	`gemini-3.1-flash-lite` Launch stage: GA Release date: May 7, 2026 Retirement date: May 7, 2027 or later `gemini-3.1-flash-lite-preview` Launch stage: Public preview Release date: March 3, 2026 Retirement date: July 9, 2026
Security controls	Online prediction	Data residency CMEK VPC-SC AXT
	Batch inference	Data residency CMEK VPC-SC AXT
	Context caching	Data residency CMEK VPC-SC AXT
	See Security controls for more information.
Pricing	See Pricing.

Gemini 3.1 Flash-Lite Stay organized with collections Save and categorize content based on your preferences.

Gemini 3.1 Flash-Lite