Send audio and video streams

This document describes how to send audio and video streams to the Live API for real-time, bidirectional communication with Gemini models. Learn how to configure and transmit audio and video data to build dynamic and interactive applications.

Send audio streams

Implementing real-time audio requires strict adherence to sample rate specifications and careful buffer management to ensure low latency and natural interruptibility.

The Live API supports the following audio formats:

Input audio: Raw 16-bit PCM audio at 16 kHz, little-endian
Output audio: Raw 16-bit PCM audio at 24 kHz, little-endian

The following code sample shows you how to send streaming audio data:

import asyncio
# Assumes session is an active Live API session
# and chunk_data contains bytes of raw 16-bit PCM audio at 16 kHz.
from google.genai import types
# Send audio input data in chunks
await session.send_realtime_input(
    audio=types.Blob(data=chunk_data, mime_type="audio/pcm;rate=16000")
)

The client must maintain a playback buffer. The server streams audio in chunks within server_content messages. The client's responsibility is to decode, buffer, and play the data.

The following code sample shows you how to process streaming audio data:

import asyncio
# Assumes session is an active Live API session
# and audio_queue is an asyncio.Queue for buffering audio for playback.
import numpy as np

async for msg in session.receive():
    server_content = msg.server_content
    if server_content:
        # 1. Handle Interruption
        if server_content.interrupted:
            print("\n[Interrupted] Flushing buffer...")
            # Clear the Python queue
            while not audio_queue.empty():
                try: audio_queue.get_nowait()
                except asyncio.QueueEmpty: break
            # Send signal to worker to reset hardware buffers if needed
            await audio_queue.put(None)
            continue

        # 2. Process Audio chunks
        if server_content.model_turn:
            for part in server_content.model_turn.parts:
                if part.inline_data:
                    # Add PCM data to playback queue
                    await audio_queue.put(np.frombuffer(part.inline_data.data, dtype='int16'))

Send video streams

Video streaming provides visual context. The Live API expects a sequence of discrete image frames and supports video frames input at 1 FPS. For best results, use native 768x768 resolution at 1 FPS.

The following code sample shows you how to send streaming video data:

import asyncio
# Assumes session is an active Live API session
# and chunk_data contains bytes of a JPEG image.
from google.genai import types
# Send video input data in chunks
await session.send_realtime_input(
    media=types.Blob(data=chunk_data, mime_type="image/jpeg")
)

The client implementation captures a frame from the video feed, encodes it as a JPEG blob, and transmits it using the realtime_input message structure.

import cv2
import asyncio
from google.genai import types

async def send_video_stream(session):
    # Open webcam
    cap = cv2.VideoCapture(0)

    while True:
        ret, frame = cap.read()
        if not ret: break

        # 1. Resize to optimal resolution (768x768 max)
        frame = cv2.resize(frame, (768, 768))

        # 2. Encode as JPEG
        _, buffer = cv2.imencode('.jpg', frame,)

        # 3. Send as realtime input
        await session.send_realtime_input(
            media=types.Blob(data=buffer.tobytes(), mime_type="image/jpeg")
        )

        # 4. Wait 1 second (1 FPS)
        await asyncio.sleep(1.0)

    cap.release()

Configure media resolution

You can specify the resolution for input media by setting the media_resolution field in the session configuration. Lower resolution reduces token usage and latency, while higher resolution improves detail recognition. Supported values include low, medium, and high.

config = {
    "response_modalities": ["audio"],
    "media_resolution": "low",
}

Send audio and video streams Stay organized with collections Save and categorize content based on your preferences.

Send audio streams

Send video streams

Configure media resolution

What's next

Send audio and video streams