Mulai menggunakan Gemini Live API menggunakan WebSockets

Tutorial ini menunjukkan cara terhubung ke Gemini Live API menggunakan WebSockets. Dalam tutorial ini, Anda akan membangun aplikasi multimodal real-time dengan frontend JavaScript standar dan server Python yang menangani autentikasi dan proxy.

Sebelum memulai

Selesaikan langkah-langkah berikut untuk menyiapkan lingkungan Anda.

Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Roles required to select or create a project

Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

Install the Google Cloud CLI.

Jika Anda menggunakan penyedia identitas (IdP) eksternal, Anda harus login ke gcloud CLI dengan identitas gabungan Anda terlebih dahulu.

Untuk melakukan inisialisasi gcloud CLI, jalankan perintah berikut:

gcloud init

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Roles required to select or create a project

Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

Install the Google Cloud CLI.

Jika Anda menggunakan penyedia identitas (IdP) eksternal, Anda harus login ke gcloud CLI dengan identitas gabungan Anda terlebih dahulu.

Untuk melakukan inisialisasi gcloud CLI, jalankan perintah berikut:

gcloud init

Instal Git.
Instal Python 3.

Meng-clone aplikasi demo

Buat clone repositori aplikasi demo dan buka direktori tersebut:

git clone https://github.com/GoogleCloudPlatform/generative-ai.git &&
cd generative-ai/gemini/multimodal-live-api/native-audio-websocket-demo-apps/plain-js-demo-app

Struktur project

Aplikasi ini mencakup file berikut:

/
├── server.py            # WebSocket proxy + HTTP server
├── requirements.txt     # Python dependencies
└── frontend/
    ├── index.html       # UI
    ├── geminilive.js    # Gemini API client
    ├── mediaUtils.js    # Audio/video streaming
    ├── tools.js         # Custom tool definitions
    └── script.js        # App logic

Menjalankan server backend

Backend (server.py) menangani autentikasi dan bertindak sebagai proxy WebSocket antara klien dan Gemini Live API.

Untuk menjalankan server backend, jalankan perintah berikut:

Instal dependensi:
```
pip3 install -r requirements.txt
```
Mengautentikasi dengan Google Cloud:
```
gcloud auth application-default login
```
Mulai server:
```
python3 server.py
```

Buka UI frontend dan hubungkan dengan Gemini

Frontend mengelola perekaman dan pemutaran audio dan video. File geminilive.js menangani koneksi WebSocket ke backend.

const client = new GeminiLiveAPI(proxyUrl, projectId, model);
client.addFunction(toolInstance); // Add custom tools
client.connect(accessToken); // Connect (token optional with proxy)

Untuk membuka UI frontend dan terhubung dengan Gemini, lakukan hal berikut:

Buka browser Anda dan buka http://localhost:8000.
Di kolom Project ID, masukkan ID project Google Cloud Anda.
Klik Hubungkan.

Berinteraksi dengan Gemini

Coba lakukan hal berikut:

Input teks: Anda dapat menulis pesan teks ke Gemini dengan memasukkan pesan Anda di kolom pesan, lalu mengklik Kirim. Gemini merespons pesan menggunakan audio.
Input suara: Untuk berbicara dengan Gemini, klik Mulai mikrofon. Gemini merespons perintah menggunakan audio.
Input video: Agar Gemini dapat melihat melalui kamera Anda, klik Mulai kamera. Anda dapat berbicara dengan Gemini tentang apa yang dilihatnya melalui kamera Anda.

Langkah berikutnya

Pelajari cara mengonfigurasi bahasa dan suara.
Pelajari cara mengonfigurasi kemampuan Gemini.
Pelajari praktik terbaik Gemini Live API.

Mulai menggunakan Gemini Live API menggunakan WebSockets Tetap teratur dengan koleksi Simpan dan kategorikan konten berdasarkan preferensi Anda.

Sebelum memulai

Meng-clone aplikasi demo

Struktur project

Menjalankan server backend

Buka UI frontend dan hubungkan dengan Gemini

Berinteraksi dengan Gemini

Langkah berikutnya

Mulai menggunakan Gemini Live API menggunakan WebSockets