This tutorial shows you how to connect to the Gemini Live API by using the Google Gen AI SDK for Python. In this tutorial, you build a real-time multimodal application with a robust Python backend handling the API connection.
Before you begin
Complete the following steps to set up your environment.
- Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Roles required to select or create a project
- Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
-
Create a project: To create a project, you need the Project Creator role
(
roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.createpermission. Learn how to grant roles.
-
Verify that billing is enabled for your Google Cloud project.
-
Install the Google Cloud CLI.
-
If you're using an external identity provider (IdP), you must first sign in to the gcloud CLI with your federated identity.
-
To initialize the gcloud CLI, run the following command:
gcloud init -
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Roles required to select or create a project
- Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
-
Create a project: To create a project, you need the Project Creator role
(
roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.createpermission. Learn how to grant roles.
-
Verify that billing is enabled for your Google Cloud project.
-
Install the Google Cloud CLI.
-
If you're using an external identity provider (IdP), you must first sign in to the gcloud CLI with your federated identity.
-
To initialize the gcloud CLI, run the following command:
gcloud init - Install Git.
- Install Python 3.
Clone the demo app
Clone the demo app repository and navigate to that directory:
git clone https://github.com/GoogleCloudPlatform/generative-ai.git &&
cd generative-ai/gemini/multimodal-live-api/native-audio-websocket-demo-apps/plain-js-python-sdk-demo-app
Project structure
The application includes the following files:
/
├── main.py # FastAPI server and WebSocket endpoint
├── gemini_live.py # Gemini Live API wrapper using Gen AI SDK
├── requirements.txt # Python dependencies
└── frontend/
├── index.html # User Interface
├── main.js # Application logic
├── gemini-client.js # WebSocket client for backend communication
├── media-handler.js # Audio/Video capture and playback
└── pcm-processor.js # AudioWorklet for PCM processing
Configure environment variables
For the purposes of this demo, the only environment variable that we need to
configure is the one that defines the ID of your Google Cloud project. The following
command creates an .env file that sets the environment variable PROJECT_ID.
Replace PROJECT_ID with the project ID of your Google Cloud project.
echo "PROJECT_ID=PROJECT_ID" > .env
Run the backend server
The backend (main.py) handles the connection between the client and the
Gemini Live API. The entry point is a FastAPI server that exposes a WebSocket
endpoint. It accepts audio and video chunks from the frontend and forwards them
to the GeminiLive session. The GeminiLive class in gemini_live.py wraps
the genai.Client to manage the session.
# Connects using the SDK
async with self.client.aio.live.connect(model=self.model, config=config) as session:
# Manages input/output queues
await asyncio.gather(
send_audio(),
send_video(),
receive_responses()
)
To run the backend server, run the following commands:
Install dependencies:
pip3 install -r requirements.txtAuthenticate with Google Cloud:
gcloud auth application-default loginStart the server:
python3 main.py
Open the frontend UI and connect with Gemini
The frontend manages audio and video capture and playback. The
gemini-client.js file handles the WebSocket connection to the backend. It
sends base64-encoded media chunks to the backend and receives audio responses
from the Gemini Live API, which are then played back to the user.
To open the frontend UI and connect with Gemini, do the following:
- Open your browser and navigate to http://localhost:8000.
- Click Connect.
Interact with Gemini
Try to do the following:
- Text input: You can write a text message to Gemini by entering your message in the message field and clicking Send. Gemini responds to the message using audio.
- Voice input: To speak to Gemini, click Start mic. Gemini responds to the prompt using audio.
- Video input: To let Gemini see through your camera, click Start camera. You can talk to Gemini about what it sees through your camera.
What's next
- Learn how to configure language and voice.
- Learn how to configure Gemini capabilities.
- Learn about Gemini Live API best practices.