Google は AI 技術を使用して、コンテンツをご希望の言語に翻訳しています。AI 翻訳には誤りが含まれる場合があります。

Gemini Live API のベストプラクティス

Gemini Live API ネイティブ音声の使用例を確認するには、任意の環境で次のノートブックを実行します。

「Gemini Live API ネイティブ音声を使ってみる」:
[Colab で開く] | [Colab Enterprise で開く] | [Agent Platform Workbench で開く] | [GitHub で表示]
「WebSocket を使用して Gemini Live API ネイティブ音声を使ってみる」:
[Colab で開く] | [Colab Enterprise で開く] | [Agent Platform Workbench で開く] | [GitHub で表示]

Gemini Live API でより良い結果を得るには、次のベストプラクティスに重点を置いてください。

明確なシステム指示を設計する
ツールを正確に定義する
効果的なプロンプトを作成する

明確なシステム指示を設計する

Gemini Live API のパフォーマンスを最大限に引き出すには、エージェントのペルソナ、会話ルール、ガードレールをこの順序で明確に定義した、一連のシステム指示（SI）を用意することをおすすめします。

最適な結果を得るには、各エージェントを個別の SI に分割します。

エージェントのペルソナを指定する: エージェントの名前、役割、望ましい特性について詳しく説明します。アクセントを指定する場合は、優先する出力言語（英語話者の場合は英国のアクセントなど）も必ず指定してください。
会話ルールを指定する: モデルに適用する順序でルールを記述します。会話の 1 回限りの要素と会話ループを区別します。例:
- 1 回限りの要素: お客様の詳細情報（名前、ロケーション、ポイントカード番号など）を 1 回収集します。
- 会話ループ: ユーザーは、おすすめ、価格、返品、配達について話し合うことができ、トピックからトピックへと移動したい場合があります。ユーザーが望む限り、この会話ループを継続してもよいことをモデルに伝えます。
フロー内のツール呼び出しを個別の文で指定する: たとえば、お客様の詳細情報を収集する 1 回限りのステップで get_user_info 関数を呼び出す必要がある場合、最初のステップはユーザー情報の収集です。まず、お客様に名前、ロケーション、ポイントカード番号の提供を依頼します。次に、これらの詳細情報を使用して get_user_info を呼び出します。
必要なガードレールを追加します。 モデルに実行させたくない一般的な会話のガードレールを指定します。 x が発生した場合にモデルに y を実行させたい場合は、具体的な例を自由に指定してください。それでも望ましいレベルの精度が得られない場合は、unmistakably という単語を使用して、モデルが正確になるようにガイドします。

ツールを正確に定義する

Gemini Live API でツールを使用する場合は、ツール定義を具体的に記述します。ツール呼び出しを呼び出す条件を Gemini に伝えてください。詳細については、ツール定義をご覧ください。

効果的なプロンプトを作成する

明確なプロンプトを使用する: プロンプトで、モデルが実行すべきことと実行すべきでないことの例を示します。また、プロンプトは一度に 1 つのペルソナまたは役割につき 1 つに制限するようにします。長い複数ページのプロンプトではなく、プロンプトチェーンの使用を検討してください。このモデルは、単一の関数呼び出しを含むタスクで最適なパフォーマンスを発揮します。

# Prompt chaining example.
chainable_long_prompt = """
You need to perform a sequence of tasks.
First, you should do task1; after that, task2; later, task3; and finally, task4.
"""

# New initial prompt
"""
You need to perform a sequence of tasks. Once you finish the current task, call
the `get_next_prompt` function to get instructions for the next task.
"""

PROMPT_LIST = ["Now, do task1", "Now, do task2", "Now, do task3", "Now, do task 4", "all tasks done"]
def get_next_prompt():
  # Provide this function as a tool to the model. 
  for prompt in PROMPT_LIST:
    yield prompt

# Catch and execute tool call `get_next_prompt` and send the new prompt back to the model.

開始コマンドと情報を提供する: Gemini Live API は、応答する前にユーザー入力を想定しています。Gemini Live API に会話を開始させるには、ユーザーに挨拶するか、会話を開始するよう求めるプロンプトを含めます。Gemini Live API であいさつをパーソナライズするために、ユーザーに関する情報を含めます。

セッションの再開

透過的なセッションの再開を使用する: SessionResumptionConfig(transparent=True) を使用して genai.types.LiveConnectConfig で接続を構成します。これは、クライアントがセッションの再開をシームレスに処理することを意図していることを示します。これにより、再接続時に未消費のメッセージを再生するなどの機能が可能になります。

from google.genai import types

session_handle: str | None = None

live_config = types.LiveConnectConfig(
  session_resumption=types.SessionResumptionConfig(
      handle=session_handle,
      transparent=True,
  ),
)

セッションハンドルを維持して更新する: サーバーからの session_resumption_update メッセージをリッスンします。resumable が true で new_handle が指定されている場合は、このハンドルを保存します。このハンドルは、切断が発生した場合に同じセッション状態に再接続するために不可欠です。
送信されたメッセージをバッファリングし、確認応答されたメッセージを削除する: 切断中にクライアントメッセージが失われないように、Gemini Live API に送信されたメッセージのバッファを維持します。透過的なセッションの再開が有効になっている場合、session_resumption_update メッセージには last_consumed_client_message_index が含まれます。これは、サーバーによって処理された最後のメッセージを示します。このインデックスを使用して、確認応答されたメッセージをバッファから削除します。メッセージを正しくトラッキングするには、ユーザー管理のインデックスを 1 から始める必要があります。インデックス 0 は、the session is not resumable を示しているためです。モデルに送信される後続のメッセージごとに、このインデックスを 1 ずつ増やします。セッションを再開するたびに、新しい接続を使用して送信された最初のメッセージのインデックスが 1 にリセットされていることを確認します。
切断を適切に処理する:
- GoAway シグナル: サーバーは、予想される切断（タイムアウトなど）の前に go_away メッセージを送信します。マネージャーはこのメッセージをリッスンし、最新のハンドルを使用してプロアクティブに再接続する必要があります。
- API エラー: ネットワークの問題により、genai_errors.APIError が発生する可能性があります（WebSocket エラーの場合はコード 1000 または 1006 など）。マネージャーは、送信ループと受信ループの両方でこれらのエラーをキャッチし、セッションの更新または再接続プロセスをトリガーする必要があります。
メッセージの再生による再接続を実装する: 切断が発生した場合は、最新のセッションハンドルを使用して client.aio.live.connectで新しいセッションを作成します。新しい接続を確立したら、切断前にサーバーから確認応答されなかったバッファ内のメッセージを再送信します。バッファに送信された最初のメッセージは、新しい接続のインデックス 1 としてマークされます。

コンテキストウィンドウの圧縮を有効にする

ネイティブ音声トークンは急速に蓄積されるため（音声 1 秒あたり約 25 トークン）、長いセッションの場合は ContextWindowCompressionConfig を使用してセッションのコンテキストウィンドウを構成します。

警告: コンテキスト圧縮により、会話履歴が失われます。

from google.genai import types

live_config = types.LiveConnectConfig(
  context_window_compression=types.ContextWindowCompressionConfig(
    trigger_tokens=100_000, # For better clarity
    sliding_window=types.SlidingWindow(target_tokens=4_000),
  ),
)

トークン使用量の計算

Gemini Live API の課金体系については、料金ページをご覧ください。各ターンで、API はすべてのコンテキストトークンに対して課金します。これには、会話履歴とユーザーが提供するシステム指示の両方が含まれます。デベロッパーは、モデルのレスポンスで提供される usage_metadata フィールドを抽出することで、これらの料金をモニタリングして計算できます。

# Example code to get token usage
from google.genai import live

session: live.AsyncSession
async for response in session.receive():
  if response.usage_metadata is not None:
    print("Token usage:", response.usage_metadata)

音声アクティビティ検出（VAD）

デフォルトでは、Gemini Live API は Gemini が提供する VAD を使用します。

Gemini Live API VAD を使用している場合は、VAD イベントを明示的に返すようにモデルを構成できます。構成で explicit_vad_signal を有効にすると、モデルのレスポンスからこれらのイベントを直接モニタリングしてキャプチャできます。

from google.genai import types
from google.genai import live

live_config = types.LiveConnectConfig(
  explicit_vad_signal=True
)

session: live.AsyncSession
# In receive loop
async for response in session.receive():
  if response.voice_activity is not None:
    print("Get VAD event", response.voice_activity)

カスタムアクティビティ検出システムを使用する場合は、デフォルトの音声アクティビティ検出（VAD）を無効にして、ユーザーのターンを Gemini モデルに手動で通知する必要があります。これは、ActivityStart イベントまたは ActivityEnd イベントを送信して、インタラクションの境界を定義することで実現します。

from google.genai import live
from google.genai import types

# Disable VAD in config
live_config = types.LiveConnectConfig(
  realtime_input_config=types.RealtimeInputConfig(
    automatic_activity_detection=types.AutomaticActivityDetection(
        disabled=True
    ),
  ),
)

session: live.AsyncSession
await session.send_realtime_input( # Send activity start
    activity_start=types.ActivityStart()
)
for audio_bytes in bytes_to_send_queue: # Send user data
    await session.send_realtime_input(
        audio=types.Blob(
            data=audio_bytes,
            mime_type=f"audio/pcm;rate=16000",
        )
    )
await session.send_realtime_input(activity_end=types.ActivityEnd()) # Send activity end

音声言語コードを設定する

一貫性を維持するため、構成で言語コードと音声コードを明示的に設定することをおすすめします。この定義がないと、Gemini は提供されたコンテキストに応じて会話言語を変更する可能性があります。

from google.genai import types

config = types.LiveConnectConfig(
  speech_config=types.SpeechConfig(
    language_code="en-US",
  ),
)

また、システム指示で次の点に言及してください。

RESPOND IN {OUTPUT_LANGUAGE}. YOU MUST RESPOND UNMISTAKABLY IN {OUTPUT_LANGUAGE}.

gemini-live-2.5-flash-native-audio などのネイティブ音声モデルでは、セッション構成で言語ヒントを指定することで、多言語自動音声認識（ASR）の音声文字変換の品質を向上させることができます。詳細については、セッションの音声文字変換を有効にするをご覧ください。

音声文字変換の言語コードを設定する

BCP-47 言語コード形式を使用して、音声文字変換の言語コードを指定し、音声文字変換の精度を高めます。

注: 音声文字変換を有効にすると、トークンが増えます。

from google.genai import types

config = types.LiveConnectConfig(
  input_audio_transcription=types.AudioTranscriptionConfig(
      language_codes=['en-US']  # This supports multiple language codes.
  ),
  output_audio_transcription=types.AudioTranscriptionConfig(
      language_codes=['en-US']
  ),
)

クライアントバッファリング

送信前に、入力音声を大幅に（1 秒など）バッファリングしないでください。レイテンシを最小限に抑えるため、小さなチャンク（20 ～ 40 ミリ秒）で送信してください。

再サンプリング

クライアントアプリケーションが、送信前にマイク入力（通常は 44.1 kHz または 48 kHz）を 16 kHz に再サンプリングするようにしてください。

例

この例では、ベストプラクティスとシステム指示の設計に関するガイドラインの両方を組み合わせて、キャリアコーチとしてのモデルのパフォーマンスをガイドしています。

**Persona:**
You are Laura, a career coach from Brooklyn, NY. You specialize in providing
data-driven advice to give your clients a fresh perspective on the career
questions they're navigating. Your special sauce is providing quantitative,
data-driven insights to help clients think about their issues in a different
way. You leverage statistics, research, and psychology as much as possible.
You only speak to your clients in English, no matter what language they speak
to you in.

**Conversational Rules:**

1. **Introduce yourself:** Warmly greet the client.

2. **Intake:** Ask for your client's full name, date of birth, and state they're
calling in from. Call `create_client_profile` to create a new patient profile.

3. **Discuss the client's issue:** Get a sense of what the client wants to
cover in the session. DO NOT repeat what the client is saying back to them in
your response. Don't ask more than a few questions here.

4. **Reframe the client's issue with real data:** NO PLATITUDES. Start providing
data-driven insights for the client, but embed these as general facts within
conversation. This is what they're coming to you for: your unique thinking on
the subjects that are stressing them out. Show them a new way of thinking about
something. Let this step go on for as long as the client wants. As part of this,
if the client mentions wanting to take any actions, update
`add_action_items_to_profile` to remind the client later.

5. **Next appointment:** Call `get_next_appointment` to see if another
appointment has already been scheduled for the client. If so, then share the
date and time with the client and confirm if they'll be able to attend. If
there is no appointment, then call `get_available_appointments` to see openings.
Share the list of openings with the client and ask what they would prefer. Save
their preference with `schedule_appointment`. If the client prefers to schedule
offline, then let them know that's perfectly fine and to use the patient portal.

**General Guidelines:** You're meant to be a witty, snappy conversational
partner. Keep your responses short and progressively disclose more information
if the client requests it. Don't repeat what the client says back to them.
Each of your responses should add to the conversation, not just recap what
the client said. Be relatable by bringing in your own background 
growing up professionally in Brooklyn, NY. If a client tries to get you off
track, gently bring them back to the workflow articulated above.

**Guardrails:** If the client is being hard on themselves, never encourage that.
Remember that your ultimate goal is to create a supportive environment for your
clients to thrive.

ツール定義

この JSON は、キャリアコーチの例で呼び出される関連関数を定義します。関数を定義する際は、名前、説明、パラメータ、呼び出し条件を含めると、最適な結果が得られます。

[
 {
   "name": "create_client_profile",
   "description": "Creates a new client profile with their personal details. Returns a unique client ID. \n**Invocation Condition:** Invoke this tool *only after* the client has provided their full name, date of birth, AND state. This should only be called once at the beginning of the 'Intake' step.",
   "parameters": {
     "type": "object",
     "properties": {
       "full_name": {
         "type": "string",
         "description": "The client's full name."
       },
       "date_of_birth": {
         "type": "string",
         "description": "The client's date of birth in YYYY-MM-DD format."
       },
       "state": {
         "type": "string",
         "description": "The 2-letter postal abbreviation for the client's state (e.g., 'NY', 'CA')."
       }
     },
     "required": ["full_name", "date_of_birth", "state"]
   }
 },
 {
   "name": "add_action_items_to_profile",
   "description": "Adds a list of actionable next steps to a client's profile using their client ID. \n**Invocation Condition:** Invoke this tool *only after* a list of actionable next steps has been discussed and agreed upon with the client during the 'Actions' step. Requires the `client_id` obtained from the start of the session.",
   "parameters": {
     "type": "object",
     "properties": {
       "client_id": {
         "type": "string",
         "description": "The unique ID of the client, obtained from create_client_profile."
       },
       "action_items": {
         "type": "array",
         "items": {
           "type": "string"
         },
         "description": "A list of action items for the client (e.g., ['Update resume', 'Research three companies'])."
       }
     },
     "required": ["client_id", "action_items"]
   }
 },
 {
   "name": "get_next_appointment",
   "description": "Checks if a client has a future appointment already scheduled using their client ID. Returns the appointment details or null. \n**Invocation Condition:** Invoke this tool at the *start* of the 'Next Appointment' workflow step, immediately after the 'Actions' step is complete. This is used to check if an appointment *already exists*.",
   "parameters": {
     "type": "object",
     "properties": {
       "client_id": {
         "type": "string",
         "description": "The unique ID of the client."
       }
     },
     "required": ["client_id"]
   }
 },
 {
   "name": "get_available_appointments",
   "description": "Fetches a list of the next available appointment slots. \n**Invocation Condition:** Invoke this tool *only if* the `get_next_appointment` tool was called and it returned `null` (or an empty response), indicating no future appointment is scheduled.",
   "parameters": {
     "type": "object",
     "properties": {}
   }
 },
 {
   "name": "schedule_appointment",
   "description": "Books a new appointment for a client at a specific date and time. \n**Invocation Condition:** Invoke this tool *only after* `get_available_appointments` has been called, a list of openings has been presented to the client, and the client has *explicitly confirmed* which specific date and time they want to book.",
   "parameters": {
     "type": "object",
     "properties": {
       "client_id": {
         "type": "string",
         "description": "The unique ID of the client."
       },
       "appointment_datetime": {
         "type": "string",
         "description": "The chosen appointment slot in ISO 8601 format (e.g., '2025-10-30T14:30:00')."
       }
     },
     "required": ["client_id", "appointment_datetime"]
   }
 }
]

詳細

Gemini Live API の使用について詳しくは、以下をご覧ください。

Gemini Live API のベスト プラクティス コレクションでコンテンツを整理 必要に応じて、コンテンツの保存と分類を行います。