使用 Google Gen AI SDK 開始使用 Gemini Live API

本教學課程說明如何使用 Google Gen AI SDK for Python 連線至 Gemini Live API。在本教學課程中,您將建構即時多模態應用程式,並使用健全的 Python 後端處理 API 連線。

事前準備

請按照下列步驟設定環境。

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Roles required to select or create a project

    • Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
    • Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

    Go to project selector

  3. Verify that billing is enabled for your Google Cloud project.

  4. Install the Google Cloud CLI.

  5. 若您採用的是外部識別資訊提供者 (IdP),請先使用聯合身分登入 gcloud CLI

  6. 執行下列指令,初始化 gcloud CLI:

    gcloud init
  7. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Roles required to select or create a project

    • Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
    • Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

    Go to project selector

  8. Verify that billing is enabled for your Google Cloud project.

  9. Install the Google Cloud CLI.

  10. 若您採用的是外部識別資訊提供者 (IdP),請先使用聯合身分登入 gcloud CLI

  11. 執行下列指令,初始化 gcloud CLI:

    gcloud init
  12. 安裝 Git
  13. 安裝 Python 3
  14. 複製試用版應用程式

    複製示範應用程式存放區,然後前往該目錄:

    git clone https://github.com/GoogleCloudPlatform/generative-ai.git &&
    cd generative-ai/gemini/multimodal-live-api/native-audio-websocket-demo-apps/plain-js-python-sdk-demo-app
    

    專案結構

    應用程式包含下列檔案:

    /
    ├── main.py                 # FastAPI server and WebSocket endpoint
    ├── gemini_live.py          # Gemini Live API wrapper using Gen AI SDK
    ├── requirements.txt        # Python dependencies
    └── frontend/
        ├── index.html          # User Interface
        ├── main.js             # Application logic
        ├── gemini-client.js    # WebSocket client for backend communication
        ├── media-handler.js    # Audio/Video capture and playback
        └── pcm-processor.js    # AudioWorklet for PCM processing
    

    設定環境變數

    就本示範而言,我們唯一需要設定的環境變數,就是定義 Google Cloud 專案 ID 的變數。下列指令會建立 .env 檔案,設定環境變數 PROJECT_ID。將 PROJECT_ID 替換為專案的專案 ID。 Google Cloud

    echo "PROJECT_ID=PROJECT_ID" > .env
    

    執行後端伺服器

    後端 (main.py) 會處理用戶端與 Gemini Live API 之間的連線。進入點是公開 WebSocket 端點的 FastAPI 伺服器。它會接收前端的音訊和視訊區塊,並轉送至 GeminiLive 工作階段。gemini_live.py 中的 GeminiLive 類別會包裝 genai.Client,以管理工作階段。

    # Connects using the SDK
    async with self.client.aio.live.connect(model=self.model, config=config) as session:
        # Manages input/output queues
        await asyncio.gather(
            send_audio(),
            send_video(),
            receive_responses()
        )
    

    如要執行後端伺服器,請執行下列指令:

    1. 安裝依附元件:

      pip3 install -r requirements.txt
      
    2. 透過 Google Cloud進行驗證:

      gcloud auth application-default login
      
    3. 啟動伺服器:

      python3 main.py
      

    開啟前端 UI 並連結至 Gemini

    前端會管理音訊和視訊的擷取和播放作業。gemini-client.js 檔案會處理與後端的 WebSocket 連線。這個應用程式會將以 Base64 編碼的媒體區塊傳送至後端,並從 Gemini Live API 接收音訊回應,然後播放給使用者。

    如要開啟前端 UI 並連線至 Gemini,請按照下列步驟操作:

    1. 開啟瀏覽器並前往 http://localhost:8000
    2. 按一下「連線」

    與 Gemini 互動

    請嘗試下列做法:

    • 輸入文字:在訊息欄位中輸入訊息,然後按一下「傳送」,即可撰寫文字訊息並傳送給 Gemini。Gemini 會以語音回覆訊息。
    • 語音輸入:如要透過語音與 Gemini 對話,請按一下「啟動麥克風」。 Gemini 會以語音回覆提示。
    • 視訊輸入:如要讓 Gemini 透過攝影機查看畫面,請按一下「啟動攝影機」。你可以與 Gemini 討論相機畫面中的內容。

    後續步驟