配置 Gemini 功能

本文档介绍了如何在使用 Gemini Live API 时配置 Gemini 模型的各种功能。您可以配置工具使用（例如函数调用和接地）以及原生音频功能（例如共情对话和主动音频）。

配置工具使用

多种工具与支持 Gemini Live API 的各种版本的模型兼容，包括：

函数调用
使用 Google 搜索建立依据
通过 Vertex AI RAG Engine 接地（预览版）

如需启用特定工具以供在返回的回答中使用，请在初始化模型时将该工具的名称添加到 tools 列表中。以下各部分提供了有关如何在代码中使用每种内置工具的示例。

函数调用

使用函数调用创建函数的说明，然后在请求中将该说明传递给模型。模型的响应包括与说明匹配的函数名称以及用于调用该函数的参数。

所有函数都必须在会话开始时声明，方法是将工具定义作为 LiveConnectConfig 消息的一部分发送。

如需启用函数调用，请在设置消息的 tools 列表中添加 function_declarations：

Python

import asyncio
from google import genai
from google.genai import types

client = genai.Client(
    vertexai=True,
    project=GOOGLE_CLOUD_PROJECT,
    location=GOOGLE_CLOUD_LOCATION,
)
model = "gemini-live-2.5-flash"

# Simple function definitions
turn_on_the_lights = {"name": "turn_on_the_lights"}
turn_off_the_lights = {"name": "turn_off_the_lights"}

tools = [{"function_declarations": [turn_on_the_lights, turn_off_the_lights]}]
config = {"response_modalities": ["TEXT"], "tools": tools}

async def main():
    async with client.aio.live.connect(model=model, config=config) as session:
        prompt = "Turn on the lights please"
        await session.send_client_content(turns={"parts": [{"text": prompt}]})

        async for chunk in session.receive():
            if chunk.server_content:
                if chunk.text is not None:
                    print(chunk.text)
            elif chunk.tool_call:
                function_responses = []
                for fc in tool_call.function_calls:
                    function_response = types.FunctionResponse(
                        name=fc.name,
                        response={ "result": "ok" } # simple, hard-coded function response
                    )
                    function_responses.append(function_response)

                await session.send_tool_response(function_responses=function_responses)


if __name__ == "__main__":
    asyncio.run(main())

如需查看在系统指令中使用函数调用的示例，请参阅我们的最佳实践示例。

使用 Google 搜索建立依据

您可以通过在设置消息的 tools 列表中添加 google_search，将依托 Google 搜索进行接地与 Gemini Live API 搭配使用：

Python

import asyncio
from google import genai
from google.genai import types

client = genai.Client(
    vertexai=True,
    project=GOOGLE_CLOUD_PROJECT,
    location=GOOGLE_CLOUD_LOCATION,
)
model = "gemini-live-2.5-flash"


tools = [{'google_search': {}}]
config = {"response_modalities": ["TEXT"], "tools": tools}

async def main():
    async with client.aio.live.connect(model=model, config=config) as session:
        prompt = "When did the last Brazil vs. Argentina soccer match happen?"
        await session.send_client_content(turns={"parts": [{"text": prompt}]})

        async for chunk in session.receive():
            if chunk.server_content:
                if chunk.text is not None:
                    print(chunk.text)

                # The model might generate and execute Python code to use Search
                model_turn = chunk.server_content.model_turn
                if model_turn:
                    for part in model_turn.parts:
                        if part.executable_code is not None:
                        print(part.executable_code.code)

                        if part.code_execution_result is not None:
                        print(part.code_execution_result.output)

if __name__ == "__main__":
    asyncio.run(main())

通过 Vertex AI RAG Engine 接地

您可以将 Vertex AI RAG Engine与 Live API 搭配使用，以实现上下文的接地、存储和检索：

Python

from google import genai
from google.genai import types
from google.genai.types import (Content, LiveConnectConfig, HttpOptions, Modality, Part)
from IPython import display

PROJECT_ID=YOUR_PROJECT_ID
LOCATION=YOUR_LOCATION
TEXT_INPUT=YOUR_TEXT_INPUT
MODEL_NAME="gemini-live-2.5-flash"

client = genai.Client(
   vertexai=True,
   project=PROJECT_ID,
   location=LOCATION,
)

rag_store=types.VertexRagStore(
   rag_resources=[
       types.VertexRagStoreRagResource(
           rag_corpus=  # Use memory corpus if you want to store context.
       )
   ],
   # Set `store_context` to true to allow Live API sink context into your memory corpus.
   store_context=True
)

async with client.aio.live.connect(
   model=MODEL_NAME,
   config=LiveConnectConfig(response_modalities=[Modality.TEXT],
                            tools=[types.Tool(
                                retrieval=types.Retrieval(
                                    vertex_rag_store=rag_store))]),
) as session:
   text_input=TEXT_INPUT
   print("> ", text_input, "\n")
   await session.send_client_content(
       turns=Content(role="user", parts=[Part(text=text_input)])
   )

   async for message in session.receive():
       if message.text:
           display.display(display.Markdown(message.text))
           continue

如需了解详情，请参阅在 Gemini Live API 中使用 Vertex AI RAG Engine。

配置原生音频功能

具有原生音频功能的模型支持以下功能：

配置共情对话

启用共情对话后，模型会尝试根据用户的语气和情感表达来理解和做出回应。

如需启用共情对话，请在设置消息中将 enable_affective_dialog 设置为 true：

Python

config = LiveConnectConfig(
    response_modalities=["AUDIO"],
    enable_affective_dialog=True,
)

配置主动音频

借助主动音频，您可以控制模型何时做出回答。例如，您可以让 Gemini 仅在收到提示或讨论特定主题时做出回答。如需观看主动音频的视频演示，请参阅 Gemini LiveAPI 原生音频预览。

如需启用主动音频，请在设置消息中配置 proactivity 字段，并将 proactive_audio 设置为 true：

Python

config = LiveConnectConfig(
    response_modalities=["AUDIO"],
    proactivity=ProactivityConfig(proactive_audio=True),
)

对话示例

以下示例展示了与 Gemini 就烹饪进行对话的可能情形：

Prompt: "You are an AI assistant in Italian cooking; only chime in when the topic is about Italian cooking."

Speaker A: "I really love cooking!" (No response from Gemini.)

Speaker B: "Oh yes, me too! My favorite is French cuisine." (No response from
Gemini.)

Speaker A: "I really like Italian food; do you know how to make a pizza?"

(Italian cooking topic will trigger response from Gemini.)
Gemini Live API: "I'd be happy to help! Here's a recipe for a pizza."

常见使用场景

使用主动音频时，Gemini 的行为如下：

以最短的延迟时间做出回答：Gemini 会在用户说完话后立即做出回答，从而减少中断，并帮助 Gemini 在发生中断时不会丢失上下文。
避免中断：主动音频功能可帮助 Gemini 避免受到背景噪音或外部谈话的干扰，并防止 Gemini 在对话过程中因外部谈话而做出回应。
处理中断：如果用户需要在 Gemini 回答期间中断，主动音频可让 Gemini 更轻松地进行适当的后向通道处理（即处理适当的中断），而不是像用户使用填充词（例如 umm 或 uhh）时那样。
共同聆听音频：Gemini 可以共同聆听不是说话者声音的音频文件，然后在对话中回答与该音频文件相关的问题。

结算

在 Gemini 聆听对话时，系统会收取输入音频token费用。

对于输出音频token，只有在 Gemini 回答时才会收费。如果 Gemini 不回应或保持静默，则不会收取输出音频token的费用。

如需了解详情，请参阅 Vertex AI 价格。

后续步骤

如需详细了解如何使用 Gemini Live API，请参阅：

配置 Gemini 功能 使用集合让一切井井有条 根据您的偏好保存内容并对其进行分类。

配置工具使用

函数调用

Python

使用 Google 搜索建立依据

Python

通过 Vertex AI RAG Engine 接地

Python

配置原生音频功能

配置共情对话

Python

配置主动音频

Python

常见使用场景

结算

后续步骤

配置 Gemini 功能