Google uses AI technology to translate content into your preferred language. AI translations can contain errors.

Google Gen AI SDK を使用して Gemini Live API を使ってみる

このチュートリアルでは、Google Gen AI SDK for Python を使用して Gemini Live API に接続する方法について説明します。このチュートリアルでは、API 接続を処理する堅牢な Python バックエンドを使用して、リアルタイムのマルチモーダルアプリケーションを構築します。

始める前に

環境を設定するには、次の手順を実行します。

アカウントにログインします。 Google Cloud を初めて使用する場合は、アカウントを作成して、実際のシナリオでプロダクトがどのように機能するかを評価してください。 Google Cloud新規のお客様には、ワークロードの実行、テスト、デプロイに利用できる $300 分の無料クレジットも提供されます。

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Roles required to select or create a project

Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

Enable the Agent Platform API.

Roles required to enable APIs

To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.

Enable the API

Google Cloud CLI をインストールします。

外部 ID プロバイダ（IdP）を使用している場合は、まず連携 ID を使用して gcloud CLI にログインする必要があります。

gcloud CLI を初期化するには、次のコマンドを実行します:

gcloud init

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Roles required to select or create a project

Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

Enable the Agent Platform API.

Roles required to enable APIs

To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.

Enable the API

Google Cloud CLI をインストールします。

外部 ID プロバイダ（IdP）を使用している場合は、まず連携 ID を使用して gcloud CLI にログインする必要があります。

gcloud CLI を初期化するには、次のコマンドを実行します:

gcloud init

Git のインストール。
Python 3 をインストールします。

デモアプリのクローンを作成する

デモアプリリポジトリのクローンを作成し、そのディレクトリに移動します。

git clone https://github.com/GoogleCloudPlatform/generative-ai.git &&
cd generative-ai/gemini/multimodal-live-api/native-audio-websocket-demo-apps/plain-js-python-sdk-demo-app

プロジェクトの構造

アプリケーションには次のファイルが含まれています。

/
├── main.py                 # FastAPI server and WebSocket endpoint
├── gemini_live.py          # Gemini Live API wrapper using Gen AI SDK
├── requirements.txt        # Python dependencies
└── frontend/
    ├── index.html          # User Interface
    ├── main.js             # Application logic
    ├── gemini-client.js    # WebSocket client for backend communication
    ├── media-handler.js    # Audio/Video capture and playback
    └── pcm-processor.js    # AudioWorklet for PCM processing

環境変数を構成する

このデモでは、構成する必要がある環境変数は、プロジェクトの ID を定義する変数のみです。 Google Cloud 次のコマンドは、環境変数 PROJECT_ID を設定する .env ファイルを作成します。 PROJECT_ID は、プロジェクトのプロジェクト ID に置き換えます。 Google Cloud

echo "PROJECT_ID=PROJECT_ID" > .env

バックエンドサーバーを実行する

バックエンド（main.py）は、クライアントと Gemini Live API の間の接続を処理します。エントリポイントは、WebSocket エンドポイントを公開する FastAPI サーバーです。フロントエンドから音声と動画のチャンクを受け取り、GeminiLive セッションに転送します。gemini_live.py の GeminiLive クラスは、セッションを管理するために genai.Client をラップします。

# Connects using the SDK
async with self.client.aio.live.connect(model=self.model, config=config) as session:
    # Manages input/output queues
    await asyncio.gather(
        send_audio(),
        send_video(),
        receive_responses()
    )

次のコマンドを実行して、バックエンドサーバーを起動します。

依存関係をインストールします。
```
pip3 install -r requirements.txt
```
で認証します Google Cloud：
```
gcloud auth application-default login
```
サーバーを開始します。
```
python3 main.py
```

フロントエンド UI を開き、Gemini に接続する

フロントエンドは、音声と動画のキャプチャと再生を管理します。gemini-client.js ファイルは、バックエンドへの WebSocket 接続を処理します。base64 エンコードされたメディアチャンクをバックエンドに送信し、Gemini Live API から音声レスポンスを受け取ります。このレスポンスはユーザーに再生されます。

フロントエンド UI を開き、Gemini に接続する手順は次のとおりです。

ブラウザを開き、http://localhost:8000 に移動します。
[接続] をクリックします。

Gemini とやり取りする

次のことを試してください。

テキスト入力: メッセージフィールドにメッセージを入力して [送信] をクリックすると、 Gemini にテキストメッセージを送信できます。Gemini は音声でメッセージに応答します。
音声入力: Gemini に話しかけるには、[マイクを開始] をクリックします。 Gemini は音声でプロンプトに応答します。
動画入力: Gemini がカメラを通して見えるようにするには、[カメラを開始] をクリックします。Gemini にカメラを通して見えるものについて話しかけることができます。

次のステップ

言語と音声を構成する方法を学習する。
Gemini の機能を構成する方法を学習する。
Gemini Live API のベストプラクティスについて学習する。

Google Gen AI SDK を使用して Gemini Live API を使ってみる コレクションでコンテンツを整理 必要に応じて、コンテンツの保存と分類を行います。

始める前に

デモアプリのクローンを作成する

プロジェクトの構造

環境変数を構成する

バックエンド サーバーを実行する

フロントエンド UI を開き、Gemini に接続する

Gemini とやり取りする

次のステップ

Google Gen AI SDK を使用して Gemini Live API を使ってみる

バックエンドサーバーを実行する