使用 Google Gen AI SDK 開始使用 Gemini Live API

本教學課程說明如何使用 Python 版的 Google Gen AI SDK 連線至 Gemini Live API。在本教學課程中,您將設定Google Cloud 專案,透過 Gen AI SDK 使用 Live API,將音訊檔案傳送至模型,並接收音訊回覆。

事前準備

您必須先透過 Vertex AI 設定驗證,才能傳送要求。您可以選擇使用 API 金鑰或應用程式預設憑證 (ADC) 設定驗證。

在本教學課程中,最快速的入門方式是使用 API 金鑰:

如要瞭解如何改用 ADC 設定驗證,請參閱快速入門

安裝 Gen AI SDK

執行下列指令來安裝 google-genai 程式庫:

pip install --upgrade google-genai

設定環境變數

設定專案 ID 和位置的環境變數。將 PROJECT_ID 替換為專案 ID。 Google Cloud

export GOOGLE_CLOUD_PROJECT=PROJECT_ID
export GOOGLE_CLOUD_LOCATION=global
export GOOGLE_GENAI_USE_VERTEXAI=True

發起語音通話

這個範例會建立工作階段、從檔案串流音訊,並在回應中列印音訊區塊的大小。

import asyncio
import os
import sys
from pathlib import Path
from google import genai
from google.genai import types

# Configuration
MODEL = "gemini-live-2.5-flash-preview-native-audio-09-2025"
config = {
   "response_modalities": ["audio"],
}

client = genai.Client()

async def main():
   # Establish WebSocket session
   async with client.aio.live.connect(model=MODEL, config=config) as session:
       print("Session established. Sending audio...")

       # Download sample if missing
       if not os.path.exists("input.wav"):
           !wget -q https://storage.googleapis.com/cloud-samples-data/generative-ai/audio/where_the_nearest_train_station_is.wav -O input.wav

       # Send Input (Simulated from file)
       # In production, this would be a microphone stream
       # Format: PCM, 16kHz, 16-bit, Mono, Little-Endian
       with open("input.wav", "rb") as f:
           while chunk := f.read(1024):
               await session.send_realtime_input(
                   audio=types.Blob(data=chunk, mime_type="audio/pcm;rate=16000")
               )
               await asyncio.sleep(0.01) # Simulate real-time stream

       # Receive Output
       async for message in session.receive():
           if message.server_content:
               # Check for interruptions (User barge-in)
               if message.server_content.interrupted:
                   print("[Interrupted] Clear client audio buffer immediately.")
                   continue

               # Process Audio Chunks
               model_turn = message.server_content.model_turn
               if model_turn and model_turn.parts:
                   for part in model_turn.parts:
                       if part.inline_data:
                           # Output is PCM, 24kHz, 16-bit, Mono
                           audio_data = part.inline_data.data
                           print(f"Received audio chunk: {len(audio_data)} bytes")

           if message.server_content.turn_complete:
               print("Turn complete.")

if "ipykernel" in sys.modules:
   # Run directly in notebook
   await main()
else:
   # Run as standard .py script
   asyncio.run(main())

後續步驟