Gemini 3.1 Flash-Lite

Gemini 3.1 Flash-Lite is our most cost-efficient Gemini model, optimized for low latency use cases for high-volume, cost-sensitive LLM traffic. It provides a significant quality increase over Gemini 2.0 Flash-Lite and Gemini 2.5 Flash-Lite models, matching Gemini 2.5 Flash performance across key capability areas:

  • Improved response quality: Aims to match 2.5 Flash performance.
  • Improved instruction following: Targeted improvements to serve as a reliable migration path for complex chatbot and instruction-heavy workflows.
  • Improved audio input: Improved audio-input quality for tasks like Automated Speech Recognition (ASR).
  • Expanded thinking support: You can control how much reasoning the model performs by choosing from minimal, low, medium, or high thinking levels. This feature lets you balance response quality and speed for your specific use case.

Try in Agent Studio Deploy example app

Note: "Deploy example app" requires a Google Cloud project with billing and Agent Platform API enabled.
Model ID gemini-3.1-flash-lite
Modalities
Text Input and output
Image Input only
Audio Input only
Video Input only
Token limits Context window 1,048,576
Maximum output tokens 65,535 (default)
Capabilities
Consumption options
See Consumption options for more information.
Technical specifications Image
  • Maximum images per prompt: 3,000
  • Maximum file size per file for inline data or direct uploads through the console: 7 MB
  • Maximum file size per file from Google Cloud Storage: 30 MB
  • Supported MIME types:
    image/png, image/jpeg, image/webp, image/heic, image/heif
Text
  • Maximum number of files per prompt: 3,000
  • Maximum number of pages per file: 3,000
  • Maximum file size per file for the API or Cloud Storage imports: 50 MB(application/pdf) or 7 MB(text/plain)
  • Maximum file size per file for direct uploads through the console: 7 MB
  • Supported MIME types:
    application/pdf, text/plain
Video
  • Maximum video length (with audio): Approximately 45 minutes
  • Maximum video length (without audio): Approximately 1 hour
  • Maximum number of videos per prompt: 10
  • Supported MIME types:
    video/x-flv, video/quicktime, video/mpeg, video/mpegs, video/mpg, video/mp4, video/webm, video/wmv, video/3gpp
Audio
  • Maximum audio length per prompt: Approximately 8.4 hours, or up to 1 million tokens
  • Maximum number of audio files per prompt: 1
  • Supported MIME types:
    audio/x-aac, audio/flac, audio/mp3, audio/m4a, audio/mpeg, audio/mpga, audio/mp4, audio/ogg, audio/pcm, audio/wav, audio/webm
Parameter defaults
  • Temperature: 0.0-2.0 (default 1.0)
  • topP: 0.0-1.0 (default 0.95)
  • topK: 64 (fixed)
  • candidateCount: 1–8 (default 1)
Supported regions

Model availability

ML processing

  • United States
    • Multi-region
  • Europe
    • Multi-region
See Deployments and endpoints for more information.
Knowledge cutoff date January 2025
Versions
  • gemini-3.1-flash-lite
    • Launch stage: GA
    • Release date: May 7, 2026
    • Retirement date: Not before May 7, 2027
  • gemini-3.1-flash-lite-preview
    • Launch stage: Public preview
    • Release date: March 3, 2026
    • Retirement date: July 9, 2026
Security controls Online prediction
  • Data residency
  • CMEK
  • VPC-SC
  • AXT
Batch inference
  • Data residency
  • CMEK
  • VPC-SC
  • AXT
Context caching
  • Data residency
  • CMEK
  • VPC-SC
  • AXT
See Security controls for more information.
Pricing See Pricing.