使用 WebSocket 開始使用 Gemini Live API

本教學課程說明如何使用 WebSocket 連線至 Gemini Live API。在本教學課程中,您將建構即時多模態應用程式,其中包含純 JavaScript 前端,以及處理驗證和 Proxy 的 Python 伺服器。

事前準備

請按照下列步驟設定環境。

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Roles required to select or create a project

    • Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
    • Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

    Go to project selector

  3. Verify that billing is enabled for your Google Cloud project.

  4. Install the Google Cloud CLI.

  5. 若您採用的是外部識別資訊提供者 (IdP),請先使用聯合身分登入 gcloud CLI

  6. 執行下列指令,初始化 gcloud CLI:

    gcloud init
  7. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Roles required to select or create a project

    • Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
    • Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

    Go to project selector

  8. Verify that billing is enabled for your Google Cloud project.

  9. Install the Google Cloud CLI.

  10. 若您採用的是外部識別資訊提供者 (IdP),請先使用聯合身分登入 gcloud CLI

  11. 執行下列指令,初始化 gcloud CLI:

    gcloud init
  12. 安裝 Git
  13. 安裝 Python 3
  14. 複製試用版應用程式

    複製示範應用程式存放區,然後前往該目錄:

    git clone https://github.com/GoogleCloudPlatform/generative-ai.git &&
    cd generative-ai/gemini/multimodal-live-api/native-audio-websocket-demo-apps/plain-js-demo-app
    

    專案結構

    應用程式包含下列檔案:

    /
    ├── server.py            # WebSocket proxy + HTTP server
    ├── requirements.txt     # Python dependencies
    └── frontend/
        ├── index.html       # UI
        ├── geminilive.js    # Gemini API client
        ├── mediaUtils.js    # Audio/video streaming
        ├── tools.js         # Custom tool definitions
        └── script.js        # App logic
    

    執行後端伺服器

    後端 (server.py) 會處理驗證,並充當用戶端和 Gemini Live API 之間的 WebSocket 代理。

    如要執行後端伺服器,請執行下列指令:

    1. 安裝依附元件:

      pip3 install -r requirements.txt
      
    2. 透過 Google Cloud進行驗證:

      gcloud auth application-default login
      
    3. 啟動伺服器:

      python3 server.py
      

    開啟前端 UI 並連結至 Gemini

    前端會管理音訊和視訊的擷取和播放作業。geminilive.js 檔案會處理與後端的 WebSocket 連線。

    const client = new GeminiLiveAPI(proxyUrl, projectId, model);
    client.addFunction(toolInstance); // Add custom tools
    client.connect(accessToken); // Connect (token optional with proxy)
    

    如要開啟前端 UI 並連線至 Gemini,請按照下列步驟操作:

    1. 開啟瀏覽器並前往 http://localhost:8000
    2. 在「Project ID」欄位中,輸入 Google Cloud 專案的 ID。
    3. 按一下「連線」

    與 Gemini 互動

    請嘗試下列做法:

    • 輸入文字:在訊息欄位中輸入訊息,然後按一下「傳送」,即可撰寫文字訊息並傳送給 Gemini。Gemini 會以語音回覆訊息。
    • 語音輸入:如要透過語音與 Gemini 對話,請按一下「啟動麥克風」。 Gemini 會以語音回覆提示。
    • 視訊輸入:如要讓 Gemini 透過攝影機查看畫面,請按一下「啟動攝影機」。你可以與 Gemini 討論相機畫面中的內容。

    後續步驟