自 2025 年 4 月 29 日起，Gemini 1.5 Pro 和 Gemini 1.5 Flash 模型將無法用於先前未使用這些模型的專案，包括新專案。詳情請參閱「模型版本和生命週期」。

本頁面由 Cloud Translation API 翻譯而成。

Live API

Live API 可讓您與 Gemini 進行低延遲的雙向語音/視訊互動。使用 Live API 為使用者提供自然的語音對話，包括透過語音指令中斷模型回應的功能。

本文將介紹使用 Live API 的基本概念，包括其功能、入門範例和基本用途程式碼範例。如要瞭解如何使用 Live API 開始互動式對話，請參閱「使用 Live API 進行互動式對話」。如要瞭解 Live API 可使用的工具，請參閱「內建工具」。

支援的模型

Live API 可在 Google Gen AI SDK 和 Vertex AI Studio 中使用。部分功能 (例如文字輸入和輸出) 僅適用於 Gen AI SDK。

您可以將 Live API 與下列模型搭配使用：

模型版本	可用性層級
`gemini-live-2.5-flash`	私人 GA^*
`gemini-live-2.5-flash-preview-native-audio`	公開預先發布版

^* 請與 Google 帳戶團隊代表聯絡，要求取得存取權。

如需進一步瞭解技術規格和限制，請參閱 Live API 參考指南。

啟動條件範例

您可以透過下列範例開始使用 Live API：

Jupyter 筆記本：

應用程式和指南示範：

Live API 功能

即時多模態理解：使用內建的音訊和視訊串流支援功能，與 Gemini 對話，瞭解系統在影片動態消息或螢幕分享畫面中看到的內容。
使用內建工具：函式呼叫和使用 Google 搜尋功能等工具，可無縫整合至對話中，提供更實用且動態的互動體驗。
低延遲互動：與 Gemini 進行低延遲、類似人類的互動。
支援多種語言： 可使用 24 種支援語言進行對話。
(僅限 GA 版本) 支援已佈建的處理量：使用固定費用的固定期限訂閱方案，可在幾個期限長度中選擇，為 Vertex AI 上支援的生成式 AI 模型 (包括 Live API) 預留處理量。

Gemini 2.5 Flash 搭配 Live API 也提供原生音訊做為公開測試功能。原生音訊功能推出以下功能：

情感對話：Live API 可理解並回應使用者的語氣。同樣的字詞，如果以不同方式說出，可能會導致截然不同的對話，並產生更多細微差異。
主動音訊和情境感知：Live API 會聰明地忽略環境對話和其他不相關的音訊，瞭解何時要聆聽，何時要保持沉默。

如要進一步瞭解原生音訊，請參閱「內建工具」。

支援的音訊格式

Live API 支援下列音訊格式：

輸入音訊：原始 16 位元 PCM 音訊，頻率為 16 kHz，小端序
輸出音訊：原始 16 位元 PCM 音訊，頻率為 24 kHz，小端序

從音訊輸入內容取得文字回應

您可以將音訊轉換為 16 位元 PCM、16 kHz 的單聲道格式，藉此傳送音訊並接收文字回應。以下範例會讀取 WAV 檔案，並以正確格式傳送：

Python 適用的 Gen AI SDK

# Test file: https://storage.googleapis.com/generativeai-downloads/data/16000.wav
# Install helpers for converting files: pip install librosa soundfile

import asyncio
import io
from pathlib import Path
from google import genai
from google.genai import types
import soundfile as sf
import librosa

client = genai.Client(
    vertexai=True,
    project=GOOGLE_CLOUD_PROJECT,
    location=GOOGLE_CLOUD_LOCATION,
)
model = "gemini-live-2.5-flash"
config = {"response_modalities": ["TEXT"]}

async def main():
    async with client.aio.live.connect(model=model, config=config) as session:

        buffer = io.BytesIO()
        y, sr = librosa.load("sample.wav", sr=16000)
        sf.write(buffer, y, sr, format="RAW", subtype="PCM_16")
        buffer.seek(0)
        audio_bytes = buffer.read()

        # If already in correct format, you can use this:
        # audio_bytes = Path("sample.pcm").read_bytes()

        await session.send_realtime_input(
            audio=types.Blob(data=audio_bytes, mime_type="audio/pcm;rate=16000")
        )

        async for response in session.receive():
            if response.text is not None:
                print(response.text)

if __name__ == "__main__":
    asyncio.run(main())

透過文字輸入取得語音回應

請使用這個範例，傳送文字輸入內容並接收合成語音回應：

Python 適用的 Gen AI SDK

import asyncio
import numpy as np
from IPython.display import Audio, Markdown, display
from google import genai
from google.genai.types import (
  Content,
  LiveConnectConfig,
  HttpOptions,
  Modality,
  Part,
  SpeechConfig,
  VoiceConfig,
  PrebuiltVoiceConfig,
)

client = genai.Client(
  vertexai=True,
  project=GOOGLE_CLOUD_PROJECT,
  location=GOOGLE_CLOUD_LOCATION,
)

voice_name = "Aoede"

config = LiveConnectConfig(
  response_modalities=["AUDIO"],
  speech_config=SpeechConfig(
      voice_config=VoiceConfig(
          prebuilt_voice_config=PrebuiltVoiceConfig(
              voice_name=voice_name,
          )
      ),
  ),
)

async with client.aio.live.connect(
  model="gemini-live-2.5-flash",
  config=config,
) as session:
  text_input = "Hello? Gemini are you there?"
  display(Markdown(f"**Input:** {text_input}"))

  await session.send_client_content(
      turns=Content(role="user", parts=[Part(text=text_input)]))

  audio_data = []
  async for message in session.receive():
      if (
          message.server_content.model_turn
          and message.server_content.model_turn.parts
      ):
          for part in message.server_content.model_turn.parts:
              if part.inline_data:
                  audio_data.append(
                      np.frombuffer(part.inline_data.data, dtype=np.int16)
                  )

  if audio_data:
      display(Audio(np.concatenate(audio_data), rate=24000, autoplay=True))

如需更多傳送文字的範例，請參閱入門指南。

轉錄音訊內容

Live API 可轉錄輸入和輸出音訊。請參考以下範例啟用轉錄功能：

Python 適用的 Gen AI SDK

import asyncio
from google import genai
from google.genai import types

client = genai.Client(
    vertexai=True,
    project=GOOGLE_CLOUD_PROJECT,
    location=GOOGLE_CLOUD_LOCATION,
)
model = "gemini-live-2.5-flash"

config = {
    "response_modalities": ["AUDIO"],
    "input_audio_transcription": {},
    "output_audio_transcription": {}
}

async def main():
    async with client.aio.live.connect(model=model, config=config) as session:
        message = "Hello? Gemini are you there?"

        await session.send_client_content(
            turns={"role": "user", "parts": [{"text": message}]}, turn_complete=True
        )

        async for response in session.receive():
            if response.server_content.model_turn:
                print("Model turn:", response.server_content.model_turn)
            if response.server_content.input_transcription:
                print("Input transcript:", response.server_content.input_transcription.text)
            if response.server_content.output_transcription:
                print("Output transcript:", response.server_content.output_transcription.text)

if __name__ == "__main__":
    asyncio.run(main())

WebSocket

# Set model generation_config
CONFIG = {
    'response_modalities': ['AUDIO'],
}

headers = {
    "Content-Type": "application/json",
    "Authorization": f"Bearer {bearer_token[0]}",
}

# Connect to the server
async with connect(SERVICE_URL, additional_headers=headers) as ws:
    # Setup the session
    await ws.send(
        json.dumps(
            {
                "setup": {
                    "model": "gemini-2.0-flash-live-preview-04-09",
                    "generation_config": CONFIG,
                    'input_audio_transcription': {},
                    'output_audio_transcription': {}
                }
            }
        )
    )

    # Receive setup response
    raw_response = await ws.recv(decode=False)
    setup_response = json.loads(raw_response.decode("ascii"))

    # Send text message
    text_input = "Hello? Gemini are you there?"
    display(Markdown(f"**Input:** {text_input}"))

    msg = {
        "client_content": {
            "turns": [{"role": "user", "parts": [{"text": text_input}]}],
            "turn_complete": True,
        }
    }

    await ws.send(json.dumps(msg))

    responses = []
    input_transcriptions = []
    output_transcriptions = []

    # Receive chucks of server response
    async for raw_response in ws:
        response = json.loads(raw_response.decode())
        server_content = response.pop("serverContent", None)
        if server_content is None:
            break

        if (input_transcription := server_content.get("inputTranscription")) is not None:
            if (text := input_transcription.get("text")) is not None:
                input_transcriptions.append(text)
        if (output_transcription := server_content.get("outputTranscription")) is not None:
            if (text := output_transcription.get("text")) is not None:
                output_transcriptions.append(text)

        model_turn = server_content.pop("modelTurn", None)
        if model_turn is not None:
            parts = model_turn.pop("parts", None)
            if parts is not None:
                for part in parts:
                    pcm_data = base64.b64decode(part["inlineData"]["data"])
                    responses.append(np.frombuffer(pcm_data, dtype=np.int16))

        # End of turn
        turn_complete = server_content.pop("turnComplete", None)
        if turn_complete:
            break

    if input_transcriptions:
        display(Markdown(f"**Input transcription >** {''.join(input_transcriptions)}"))

    if responses:
        # Play the returned audio message
        display(Audio(np.concatenate(responses), rate=24000, autoplay=True))

    if output_transcriptions:
        display(Markdown(f"**Output transcription >** {''.join(output_transcriptions)}"))

Live API 透過集合功能整理內容 你可以依據偏好儲存及分類內容。

支援的模型

啟動條件範例

Live API 功能

支援的音訊格式

從音訊輸入內容取得文字回應

Python 適用的 Gen AI SDK

透過文字輸入取得語音回應

Python 適用的 Gen AI SDK

轉錄音訊內容

Python 適用的 Gen AI SDK

WebSocket

更多資訊

Live API