This tutorial shows you how to connect to the Live API by using the Python version of the Google Gen AI SDK. In this tutorial, you'll set up a Google Cloud project to use the Live API with using the Gen AI SDK, send an audio file to the model, and receive audio in response.
Before you begin
Before you can send requests, you need to set up authentication with Vertex AI. You can either set up authentication using an API key or using application default credentials (ADC).
For this tutorial, the quickest way to get started is by using an API key:
If you're a new user to Google Cloud, get an express mode API key.
If you already have a Google Cloud project, get a Google Cloud API key that's bound to a service account. Binding an API key to a service account is possible only if it's enabled in the organization policy settings. If you can't enable this setting, use application default credentials instead.
For instructions on setting up authentication using ADC instead, see our quickstart.
Install the Gen AI SDK
Run the following to install the google-genai library:
pip install --upgrade google-genai
Set up environment variables
Set environment variables for your project ID and location. Replace PROJECT_ID with your Google Cloud
project ID.
export GOOGLE_CLOUD_PROJECT=PROJECT_ID
export GOOGLE_CLOUD_LOCATION=global
export GOOGLE_GENAI_USE_VERTEXAI=True
Start an audio session
This example establishes a session, streams audio from a file, and prints the size of the audio chunks in the response.
import asyncio
import os
import sys
from pathlib import Path
from google import genai
from google.genai import types
# Configuration
MODEL = "gemini-live-2.5-flash-preview-native-audio-09-2025"
config = {
"response_modalities": ["audio"],
}
client = genai.Client()
async def main():
# Establish WebSocket session
async with client.aio.live.connect(model=MODEL, config=config) as session:
print("Session established. Sending audio...")
# Download sample if missing
if not os.path.exists("input.wav"):
!wget -q https://storage.googleapis.com/cloud-samples-data/generative-ai/audio/where_the_nearest_train_station_is.wav -O input.wav
# Send Input (Simulated from file)
# In production, this would be a microphone stream
# Format: PCM, 16kHz, 16-bit, Mono, Little-Endian
with open("input.wav", "rb") as f:
while chunk := f.read(1024):
await session.send_realtime_input(
audio=types.Blob(data=chunk, mime_type="audio/pcm;rate=16000")
)
await asyncio.sleep(0.01) # Simulate real-time stream
# Receive Output
async for message in session.receive():
if message.server_content:
# Check for interruptions (User barge-in)
if message.server_content.interrupted:
print("[Interrupted] Clear client audio buffer immediately.")
continue
# Process Audio Chunks
model_turn = message.server_content.model_turn
if model_turn and model_turn.parts:
for part in model_turn.parts:
if part.inline_data:
# Output is PCM, 24kHz, 16-bit, Mono
audio_data = part.inline_data.data
print(f"Received audio chunk: {len(audio_data)} bytes")
if message.server_content.turn_complete:
print("Turn complete.")
if "ipykernel" in sys.modules:
# Run directly in notebook
await main()
else:
# Run as standard .py script
asyncio.run(main())