This tutorial shows you how to connect to the Gemini Live API by using WebSockets. In this tutorial, you build a real-time multimodal application with a vanilla JavaScript frontend and a Python server handling the authentication and proxying.
Before you begin
Complete the following steps to set up your environment.
- Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Roles required to select or create a project
- Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
-
Create a project: To create a project, you need the Project Creator role
(
roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.createpermission. Learn how to grant roles.
-
Verify that billing is enabled for your Google Cloud project.
-
Install the Google Cloud CLI.
-
If you're using an external identity provider (IdP), you must first sign in to the gcloud CLI with your federated identity.
-
To initialize the gcloud CLI, run the following command:
gcloud init -
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Roles required to select or create a project
- Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
-
Create a project: To create a project, you need the Project Creator role
(
roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.createpermission. Learn how to grant roles.
-
Verify that billing is enabled for your Google Cloud project.
-
Install the Google Cloud CLI.
-
If you're using an external identity provider (IdP), you must first sign in to the gcloud CLI with your federated identity.
-
To initialize the gcloud CLI, run the following command:
gcloud init - Install Git.
- Install Python 3.
Clone the demo app
Clone the demo app repository and navigate to that directory:
git clone https://github.com/GoogleCloudPlatform/generative-ai.git &&
cd generative-ai/gemini/multimodal-live-api/native-audio-websocket-demo-apps/plain-js-demo-app
Project structure
The application includes the following files:
/
├── server.py # WebSocket proxy + HTTP server
├── requirements.txt # Python dependencies
└── frontend/
├── index.html # UI
├── geminilive.js # Gemini API client
├── mediaUtils.js # Audio/video streaming
├── tools.js # Custom tool definitions
└── script.js # App logic
Run the backend server
The backend (server.py) handles the authentication and acts as a WebSocket
proxy between the client and the Gemini Live API.
To run the backend server, run the following commands:
Install dependencies:
pip3 install -r requirements.txtAuthenticate with Google Cloud:
gcloud auth application-default loginStart the server:
python3 server.py
Open the frontend UI and connect with Gemini
The frontend manages audio and video capture and playback. The
geminilive.js file handles the WebSocket connection to the backend.
const client = new GeminiLiveAPI(proxyUrl, projectId, model);
client.addFunction(toolInstance); // Add custom tools
client.connect(accessToken); // Connect (token optional with proxy)
To open the frontend UI and connect with Gemini, do the following:
- Open your browser and navigate to http://localhost:8000.
- In the Project ID field, enter the ID of your Google Cloud project.
- Click Connect.
Interact with Gemini
Try to do the following:
- Text input: You can write a text message to Gemini by entering your message in the message field and clicking Send. Gemini responds to the message using audio.
- Voice input: To speak to Gemini, click Start mic. Gemini responds to the prompt using audio.
- Video input: To let Gemini see through your camera, click Start camera. You can talk to Gemini about what it sees through your camera.
What's next
- Learn how to configure language and voice.
- Learn how to configure Gemini capabilities.
- Learn about Gemini Live API best practices.