Start and manage live sessions

The Live API enables low-latency voice and text interactions by processing continuous streams of audio or text called sessions to deliver immediate, human-like spoken responses. Session lifecycle management, from the initial handshake to graceful termination, is controlled by the developer.

This page shows you how to start a conversation session with Gemini models using the Live API. You can start a session using Vertex AI Studio, the Gen AI SDK, or WebSockets.

This page also shows you how to do the following:

  • Extend a session beyond the default time limit
  • Resume a previous session
  • Update system instructions during a session
  • Configure the context window of a session
  • Enable transcription for a session

Session lifetime

Without compression, audio-only sessions are limited to 15 minutes, and audio-video sessions are limited to 2 minutes. Exceeding these limits will terminate the session, but you can use context window compression to extend sessions to an unlimited amount of time.

The lifetime of a connection is limited to around 10 minutes. When the connection terminates, the session terminates as well. In this case, you can configure a single session to stay active over multiple connections using session resumption. You'll also receive a GoAway message before the connection ends, allowing you to take further actions.

Maximum concurrent sessions

You can have up to 1,000 concurrent sessions per project on a pay-as-you-go (PayGo) plan. This limit does not apply to customers using Provisioned Throughput.

Start a session

The following tabs show how to start a live conversation session using Vertex AI Studio, the Gen AI SDK, or WebSockets:

Console

  1. Open Vertex AI Studio > Stream realtime.
  2. Click Start session to initiate the conversation.

To end the session, click Stop session.

Python

Before you begin, you must authenticate to Vertex AI using an API key or application default credentials (ADC):

gcloud auth application-default login
      

For more information on setting up authentication, see our quickstart.

import asyncio
from google import genai

# Replace the PROJECT_ID and LOCATION with your Project ID and location. 
client = genai.Client(vertexai=True, project="PROJECT_ID", location="LOCATION")

# Configuration
MODEL = "gemini-live-2.5-flash-preview-native-audio-09-2025"
config = {
   "response_modalities": ["audio"],
}

async def main():
   # Establish WebSocket session
   async with client.aio.live.connect(model=MODEL, config=config) as session:
       print("Session established. Sending audio...")

if __name__ == "__main__":
    asyncio.run(main())
      

Python

When using WebSockets, the connection is established with a standard WebSocket handshake. The endpoint is regional and uses OAuth 2.0 bearer tokens for authentication. In this scenario, the authentication token is typically passed in the WebSocket headers (such as Authorization: Bearer [TOKEN]).

import asyncio
import websockets

# Replace the PROJECT_ID and LOCATION with your Project ID and location. 
PROJECT_ID = "PROJECT_ID"
LOCATION = "LOCATION"

# Authentication
token_list = !gcloud auth application-default print-access-token
ACCESS_TOKEN = token_list[0]

# Configuration
MODEL_ID = "gemini-live-2.5-flash-preview-native-audio-09-2025"
MODEL = f"projects/{PROJECT_ID}/locations/{LOCATION}/publishers/google/models/{MODEL_ID}"
config = {
   "response_modalities": ["audio"],
}

# Construct the WSS URL
HOST = f"{LOCATION}-aiplatform.googleapis.com"
URI = f"wss://{HOST}/ws/google.cloud.aiplatform.v1.LlmBidiService/BidiGenerateContent"

async def main():
   headers = {"Authorization": f"Bearer {ACCESS_TOKEN}"}
  
   async with websockets.connect(URI, additional_headers=headers) as ws:
       print("Session established.")

       # Send Setup (Handshake)
       await ws.send(json.dumps({
           "setup": {
               "model": MODEL,
               "generation_config": config
           }
       }))
    # Send audio/video ...

if __name__ == "__main__":
    asyncio.run(main())
      

Extend a session

The default maximum length of a conversation session is 10 minutes. A goAway notification (BidiGenerateContentServerMessage.goAway) is sent to the client 60 seconds before the session ends.

You can extend the session length in 10-minute increments using the Gen AI SDK. There's no limit to the number of times you can extend a session. For an example, see Resume a previous session.

To extend a session:

Python

async for response in session.receive():
    if response.go_away is not None:
        # The connection will soon be terminated
        print(response.go_away.time_left)
      

Resume a previous session

The Live API supports session resumption to prevent the user from losing conversation context during a brief disconnect (for example, switching from Wifi to 5G). You can resume a previous session within 24 hours. Session resumption is achieved by storing cached data, including text, video, audio prompts, and model outputs. Project-level privacy is enforced for this cached data.

By default, session resumption is disabled. To enable session resumption, set the sessionResumption field of the BidiGenerateContentSetup message. If enabled, the server periodically sends SessionResumptionUpdate messages containing a session_id and a resumption token. If the WebSocket disconnects, the client can reconnect and include these credentials in the new setup message. The server then restores the previous context, allowing the conversation to continue seamlessly.

The resumption window is finite (typically around 10 minutes). If the client does not reconnect within this timeframe, the session state is discarded to free up server resources.

The following is an example of enabling session resumption and retrieving the handle ID:

Python

import asyncio
from google import genai
from google.genai import types


# Replace the PROJECT_ID and LOCATION with your Project ID and location. 
client = genai.Client(vertexai=True, project="PROJECT_ID", location="LOCATION")

# Configuration
MODEL = "gemini-live-2.5-flash-preview-native-audio-09-2025"

async def main():
    print(f"Connecting to the service with handle {previous_session_handle}...")
    async with client.aio.live.connect(
        model=MODEL,
        config=types.LiveConnectConfig(
            response_modalities=["audio"],
            session_resumption=types.SessionResumptionConfig(
                # The handle of the session to resume is passed here,
                # or else None to start a new session.
                handle=previous_session_handle
            ),
        ),
    ) as session:
        while True:
            await session.send_client_content(
                turns=types.Content(
                    role="user", parts=[types.Part(text="Hello world!")]
                )
            )
            async for message in session.receive():
                # Periodically, the server will send update messages that may
                # contain a handle for the current state of the session.
                if message.session_resumption_update:
                    update = message.session_resumption_update
                    if update.resumable and update.new_handle:
                        # The handle should be retained and linked to the session.
                        return update.new_handle

                # For the purposes of this example, placeholder input is continually fed
                # to the model. In non-sample code, the model inputs would come from
                # the user.
                if message.server_content and message.server_content.turn_complete:
                    break

if __name__ == "__main__":
    asyncio.run(main())
      

Enable seamless session resumption with transparent mode

When you enable session resumption, you can also enable transparent mode to help make the resumption process more seamless for the user. When transparent mode is enabled, the index of the client message that corresponds with the context snapshot is explicitly returned. This helps identify which client message you need to send again, when you resume the session from the resumption handle.

To enable transparent mode:

Python

config = {
   "response_modalities": ["audio"],
   "session_resumption": {
     "session_resumption_config": {
        "transparent": True,
     }
   }
}
      

Update system instructions during a session

The Live API lets you update the system instructions during an active session. Use this to adapt the model's responses, such as changing the response language or modifying the tone.

To update the system instructions mid-session, you can send text content with the system role. The updated system instruction will remain in effect for the remaining session.

Python

session.send_client_content(
      turns=types.Content(
          role="system", parts=[types.Part(text="new system instruction")]
      ),
      turn_complete=False
  )
      

Configure the context window of the session

The Live API context window is used to store real-time streamed data (25 tokens per second (TPS) for audio and 258 TPS for video) and other content, including text inputs and model outputs. A session has a context window limit of:

  • 128k tokens for native audio models
  • 32k tokens for other Live API models

In long-running sessions, as the conversation progresses, the history of audio and text tokens accumulates. If this history exceeds the model's limit, the model may hallucinate, slow down, or the session may be forcibly terminated. To enable longer sessions, you can enable context window compression by setting the contextWindowCompression field as part of the session configuration.

Context window compression uses a a server-side sliding window to truncate the oldest turns when enabled. When the accumulated tokens exceed a defined maximum length (set using the Max content size slider in Vertex AI Studio, or trigger_tokens in the API), the server automatically prunes the oldest turns or summarizes them to maintain context within the limit. In the ContextWindowCompressionConfig, you can configure a sliding-window mechanism and the number of tokens defined in the target_tokens parameter that triggers compression.

This allows for theoretically infinite session durations from the user's perspective, as the "memory" is constantly managed. Without compression, audio-only sessions might be limited to approximately 15 minutes before hitting hard limits.

The minimum and maximum lengths for the context length and target size are:

Setting (API flag) Minimum value Maximum value
Maximum context length (trigger_tokens) 5,000 128,000
Target context size (target_tokens) 0 128,000

To set the context window:

Console

  1. Open Vertex AI Studio > Stream realtime.
  2. Click to open the Advanced menu.
  3. In the Session Context section, use the Max context size slider to set the context size to a value between 5,000 and 128,000.
  4. (Optional) In the same section, use the Target context size slider to set the target size to a value between 0 and 128,000.

Python

Set the context_window_compression.trigger_tokens and context_window_compression.sliding_window.target_tokens fields in the setup message:

config = {
   "response_modalities": ["audio"],
   # Configures compression
   "context_window_compression" : {
    "trigger_tokens": 10000,
    "sliding_window": {"target_tokens" : 512}
   }
}
      

Enable audio transcription for the session

You can enable transcriptions for both the input and output audio.

To receive transcriptions, you must update your session configuration. You need to add the input_audio_transcription and output_audio_transcription objects and ensure text is included in response_modalities.

config = {
    "response_modalities": ["audio", "text"],
    "input_audio_transcription": {},
    "output_audio_transcription": {},
}

Processing the response

The following code sample demonstrates how to connect using the configured session and extract the text parts (transcriptions) alongside the audio data.

# Receive Output Loop
async for message in session.receive():
    server_content = message.server_content
    if server_content:
        # Handle Model Turns (Audio + Text)
        model_turn = server_content.model_turn
        if model_turn and model_turn.parts:
            for part in model_turn.parts:
                # Handle Text (Transcriptions)
                if part.text:
                    print(f"Transcription: {part.text}")
                # Handle Audio
                if part.inline_data:
                    audio_data = part.inline_data.data
                    # Process audio bytes...
                    pass

        # Check for turn completion
        if server_content.turn_complete:
            print("Turn complete.")

What's next