Get started with Gemini Live API using WebSockets

This tutorial shows you how to connect to Gemini Live API by using WebSockets. In this tutorial, you build a real-time multimodal application with a vanilla JavaScript frontend and a Python server handling the authentication and proxying.

Before you begin

Complete the following steps to set up your environment.

Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Roles required to select or create a project

Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

Install the Google Cloud CLI.

If you're using an external identity provider (IdP), you must first sign in to the gcloud CLI with your federated identity.

To initialize the gcloud CLI, run the following command:

gcloud init

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Roles required to select or create a project

Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

Install the Google Cloud CLI.

If you're using an external identity provider (IdP), you must first sign in to the gcloud CLI with your federated identity.

To initialize the gcloud CLI, run the following command:

gcloud init

Install Git.
Install Python 3.

Clone the demo app

Clone the demo app repository and navigate to that directory:

git clone https://github.com/GoogleCloudPlatform/generative-ai.git &&
cd generative-ai/gemini/multimodal-live-api/native-audio-websocket-demo-apps/plain-js-demo-app

Project structure

The application includes the following files:

/
├── server.py            # WebSocket proxy + HTTP server
├── requirements.txt     # Python dependencies
└── frontend/
    ├── index.html       # UI
    ├── geminilive.js    # Gemini API client
    ├── mediaUtils.js    # Audio/video streaming
    ├── tools.js         # Custom tool definitions
    └── script.js        # App logic

Run the backend server

The backend (server.py) handles the authentication and acts as a WebSocket proxy between the client and Gemini Live API.

To run the backend server, run the following commands:

Install dependencies:
```
pip3 install -r requirements.txt
```
Authenticate with Google Cloud:
```
gcloud auth application-default login
```
Start the server:
```
python3 server.py
```

Open the frontend UI and connect with Gemini

The frontend manages audio and video capture and playback. The geminilive.js file handles the WebSocket connection to the backend.

const client = new GeminiLiveAPI(proxyUrl, projectId, model);
client.addFunction(toolInstance); // Add custom tools
client.connect(accessToken); // Connect (token optional with proxy)

To open the frontend UI and connect with Gemini, do the following:

Open your browser and navigate to http://localhost:8000.
In the Project ID field, enter the ID of your Google Cloud project.
Click Connect.

Interact with Gemini

Try to do the following:

Text input: You can write a text message to Gemini by entering your message in the message field and clicking Send. Gemini responds to the message using audio.
Voice input: To speak to Gemini, click Start mic. Gemini responds to the prompt using audio.
Video input: To let Gemini see through your camera, click Start camera. You can talk to Gemini about what it sees through your camera.

What's next

Learn how to configure language and voice.
Learn how to configure Gemini capabilities.
Learn about Gemini Live API best practices.

Get started with Gemini Live API using WebSockets Stay organized with collections Save and categorize content based on your preferences.

Before you begin

Clone the demo app

Project structure

Run the backend server

Open the frontend UI and connect with Gemini

Interact with Gemini

What's next

Get started with Gemini Live API using WebSockets