Gemini Live API overview

The Gemini Live API enables low-latency, real-time voice and video interactions with Gemini. It processes continuous streams of audio, video, or text to deliver immediate, human-like spoken responses. This creates a natural conversational experience for your users.

Try the Gemini Live API in the Google Cloud console

Key features

The Gemini Live API offers a comprehensive set of features for building robust voice and video agents:

  • High audio quality: Gemini Live API provides natural, realistic-sounding speech across multiple languages.
  • Multilingual support: Converse in 24 supported languages.
  • Barge-in: Users can interrupt the model at any time for responsive interactions.
  • Affective dialog: Adapts response style and tone to match the user's input expression.
  • Proactive audio: Lets you control when the model responds and in what contexts.
  • Tool use: Integrates tools like function calling and Google Search for dynamic interactions.
  • Audio transcriptions: Provides text transcripts of both user input and model output.
  • Speech-to-speech translation: (Experimental) Optimized for low-latency translation between languages.

Technical specifications

The following table outlines the technical specifications for the Gemini Live API:

Category Details
Input modalities Audio (raw 16-bit PCM audio, 16kHz, little-endian), images/video (JPEG 1FPS), text
Output modalities Audio (raw 16-bit PCM audio, 24kHz, little-endian), text
Protocol Stateful WebSocket connection (WSS)

Supported models

The following models support the Gemini Live API. Select the appropriate model based on your interaction requirements.

Model ID Availability Use case Key features
gemini-live-2.5-flash-preview-native-audio-09-2025 Public preview Cost-efficiency in real-time voice agents. Native audio
Audio transcriptions
Voice activity detection
Affective dialog
Proactive audio
Tool use
gemini-2.5-flash-s2st-exp-11-2025 Private experimental Speech-to-Speech Translation (experimental). Optimized for translation tasks. Native audio
Audio transcriptions
Tool use
Speech-to-speech translation

Get started

Select the guide that matches your development environment:

Recommended for ease of use

Connect to the Gemini Live API using the Gen AI SDK to build a real-time multimodal application with a Python backend.

Raw protocol control

Connect to the Gemini Live API using WebSockets to build a real-time multimodal application with a JavaScript frontend and a Python backend.

Agent development kit

Create an agent and use the Agent Development Kit (ADK) Streaming to enable voice and video communication.

Partner integrations

If you prefer a simpler development process, you can use one of our partner platforms. These platforms have already integrated the Gemini Live API over the WebRTC protocol to streamline the development of real-time audio and video applications.