通过 WebSocket 开始使用 Gemini Live API

本教程介绍了如何使用 WebSocket 连接到 Gemini Live API。在本教程中,您将构建一个实时多模态应用,该应用使用纯原生 JavaScript 前端并使用 Python 服务器来处理身份验证和代理。

准备工作

完成以下步骤以设置您的环境。

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Roles required to select or create a project

    • Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
    • Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

    Go to project selector

  3. Verify that billing is enabled for your Google Cloud project.

  4. Install the Google Cloud CLI.

  5. 如果您使用的是外部身份提供方 (IdP),则必须先使用联合身份登录 gcloud CLI

  6. 如需初始化 gcloud CLI,请运行以下命令:

    gcloud init
  7. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Roles required to select or create a project

    • Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
    • Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

    Go to project selector

  8. Verify that billing is enabled for your Google Cloud project.

  9. Install the Google Cloud CLI.

  10. 如果您使用的是外部身份提供方 (IdP),则必须先使用联合身份登录 gcloud CLI

  11. 如需初始化 gcloud CLI,请运行以下命令:

    gcloud init
  12. 安装 Git
  13. 安装 Python 3
  14. 克隆演示应用

    克隆演示应用代码库并前往该目录:

    git clone https://github.com/GoogleCloudPlatform/generative-ai.git &&
    cd generative-ai/gemini/multimodal-live-api/native-audio-websocket-demo-apps/plain-js-demo-app
    

    项目结构

    该应用包含以下文件:

    /
    ├── server.py            # WebSocket proxy + HTTP server
    ├── requirements.txt     # Python dependencies
    └── frontend/
        ├── index.html       # UI
        ├── geminilive.js    # Gemini API client
        ├── mediaUtils.js    # Audio/video streaming
        ├── tools.js         # Custom tool definitions
        └── script.js        # App logic
    

    运行后端服务器

    后端 (server.py) 负责处理身份验证,并充当客户端与 Gemini Live API 之间的 WebSocket 代理。

    如要运行后端服务器,请运行以下命令:

    1. 安装依赖项:

      pip3 install -r requirements.txt
      
    2. 向 Google Cloud进行身份验证:

      gcloud auth application-default login
      
    3. 启动服务器:

      python3 server.py
      

    打开前端界面并与 Gemini 连接

    前端管理音频和视频的捕获与播放。geminilive.js 文件处理与后端的 WebSocket 连接。

    const client = new GeminiLiveAPI(proxyUrl, projectId, model);
    client.addFunction(toolInstance); // Add custom tools
    client.connect(accessToken); // Connect (token optional with proxy)
    

    如需打开前端界面并与 Gemini 连接,请执行以下操作:

    1. 打开浏览器并前往 http://localhost:8000
    2. 项目 ID 字段中,输入 Google Cloud 项目的 ID。
    3. 点击连接

    与 Gemini 互动

    请尝试执行以下操作:

    • 文字输入:您可以在消息字段中输入消息,然后点击发送,从而撰写消息发送给 Gemini。Gemini 会使用音频回复消息。
    • 语音输入:如需与 Gemini 对话,请点击启动麦克风。 Gemini 会使用音频回答提示。
    • 视频输入:如需让 Gemini 通过摄像头查看,请点击启动摄像头。你可以与 Gemini 聊聊它通过摄像头看到的内容。

    后续步骤