This document shows you how to configure various capabilities of Gemini models when using the Live API. You can configure thinking, tool use such as function calling and grounding, and native audio capabilities such as affective dialog and proactive audio.
Configure thinking
Gemini models support thinking capabilities, with dynamic thinking
enabled by default. The thinking_budget parameter guides the model on the
number of thinking tokens to use. To disable thinking, set thinking_budget to
0.
config = {
"response_modalities": ["audio"],
"thinking_config": {
"thinking_budget": 256,
},
}
Configure tool use
Several tools are compatible with various versions of Live API-supported models, including:
To enable a particular tool for usage in returned responses, include the name of
the tool in the tools list when you initialize the model. The following
sections provide examples of how to use each of the built-in tools in your code.
Function calling
Use function calling to create a description of a function, then pass that description to the model in a request. The response from the model includes the name of a function that matches the description and the arguments to call it with.
All functions must be declared at the start of the session by sending tool
definitions as part of the LiveConnectConfig message.
To enable function calling, include function_declarations in the tools list
in the setup message:
Python
import asyncio from google import genai from google.genai import types client = genai.Client( vertexai=True, project=GOOGLE_CLOUD_PROJECT, location=GOOGLE_CLOUD_LOCATION, ) model = "gemini-live-2.5-flash" # Simple function definitions turn_on_the_lights = {"name": "turn_on_the_lights"} turn_off_the_lights = {"name": "turn_off_the_lights"} tools = [{"function_declarations": [turn_on_the_lights, turn_off_the_lights]}] config = {"response_modalities": ["TEXT"], "tools": tools} async def main(): async with client.aio.live.connect(model=model, config=config) as session: prompt = "Turn on the lights please" await session.send_client_content(turns={"parts": [{"text": prompt}]}) async for chunk in session.receive(): if chunk.server_content: if chunk.text is not None: print(chunk.text) elif chunk.tool_call: function_responses = [] for fc in tool_call.function_calls: function_response = types.FunctionResponse( name=fc.name, response={ "result": "ok" } # simple, hard-coded function response ) function_responses.append(function_response) await session.send_tool_response(function_responses=function_responses) if __name__ == "__main__": asyncio.run(main())
For examples using function calling in system instructions, see our best practices example.
Grounding with Google Search
You can use Grounding with Google Search with
the Live API by including google_search in the tools list in the
setup message:
Python
import asyncio from google import genai from google.genai import types client = genai.Client( vertexai=True, project=GOOGLE_CLOUD_PROJECT, location=GOOGLE_CLOUD_LOCATION, ) model = "gemini-live-2.5-flash" tools = [{'google_search': {}}] config = {"response_modalities": ["TEXT"], "tools": tools} async def main(): async with client.aio.live.connect(model=model, config=config) as session: prompt = "When did the last Brazil vs. Argentina soccer match happen?" await session.send_client_content(turns={"parts": [{"text": prompt}]}) async for chunk in session.receive(): if chunk.server_content: if chunk.text is not None: print(chunk.text) # The model might generate and execute Python code to use Search model_turn = chunk.server_content.model_turn if model_turn: for part in model_turn.parts: if part.executable_code is not None: print(part.executable_code.code) if part.code_execution_result is not None: print(part.code_execution_result.output) if __name__ == "__main__": asyncio.run(main())
Grounding with Vertex AI RAG Engine
You can use Vertex AI RAG Engine with the Live API for grounding, storing, and retrieving contexts:
Python
from google import genai from google.genai import types from google.genai.types import (Content, LiveConnectConfig, HttpOptions, Modality, Part) from IPython import display PROJECT_ID=YOUR_PROJECT_ID LOCATION=YOUR_LOCATION TEXT_INPUT=YOUR_TEXT_INPUT MODEL_NAME="gemini-live-2.5-flash" client = genai.Client( vertexai=True, project=PROJECT_ID, location=LOCATION, ) rag_store=types.VertexRagStore( rag_resources=[ types.VertexRagStoreRagResource( rag_corpus=# Use memory corpus if you want to store context. ) ], # Set `store_context` to true to allow Live API sink context into your memory corpus. store_context=True ) async with client.aio.live.connect( model=MODEL_NAME, config=LiveConnectConfig(response_modalities=[Modality.TEXT], tools=[types.Tool( retrieval=types.Retrieval( vertex_rag_store=rag_store))]), ) as session: text_input=TEXT_INPUT print("> ", text_input, "\n") await session.send_client_content( turns=Content(role="user", parts=[Part(text=text_input)]) ) async for message in session.receive(): if message.text: display.display(display.Markdown(message.text)) continue
For more information, see Use Vertex AI RAG Engine in Gemini Live API.
Configure native audio capabilities
Models that have native audio capabilities support the following features:
Configure Affective Dialog
When Affective Dialog is enabled, the model attempts to understand and respond based on the tone of voice and emotional expressions of the user.
To enable Affective Dialog, set enable_affective_dialog to
true in the setup message:
Python
config = LiveConnectConfig( response_modalities=["AUDIO"], enable_affective_dialog=True, )
Configure Proactive Audio
Proactive Audio lets you control when the model responds. For example, you can ask Gemini to only respond when prompted or when specific topics are discussed. To see a video demonstration of Proactive Audio, see Gemini LiveAPI Native Audio Preview.
To enable Proactive Audio, configure the proactivity field
in the setup message and set proactive_audio to true:
Python
config = LiveConnectConfig( response_modalities=["AUDIO"], proactivity=ProactivityConfig(proactive_audio=True), )
Example conversation
The following is a sample of what a conversation with Gemini about cooking might look like:
Prompt: "You are an AI assistant in Italian cooking; only chime in when the topic is about Italian cooking."
Speaker A: "I really love cooking!" (No response from Gemini.)
Speaker B: "Oh yes, me too! My favorite is French cuisine." (No response from
Gemini.)
Speaker A: "I really like Italian food; do you know how to make a pizza?"
(Italian cooking topic will trigger response from Gemini.)
Live API: "I'd be happy to help! Here's a recipe for a pizza."
Common use cases
When using Proactive Audio, Gemini performs as follows:
- Responds with minimal latency: Gemini responds after the user is done speaking, reducing interruptions and helping Gemini not lose context if an interruption happens.
- Avoids interruptions: Proactive Audio helps Gemini avoid interruptions from background noise or external chatter, and prevents Gemini from responding if external chatter is introduced during a conversation.
- Handles interruptions: If the user needs to interrupt during a response from Gemini, Proactive Audio makes it easier for Gemini to appropriately back-channel (meaning appropriate interruptions are handled), rather than if a user uses filler words such as umm or uhh.
- Co-listens to audio: Gemini can co-listen to an audio file that's not the speaker's voice and subsequently answer questions about that audio file later in the conversation.
Billing
While Gemini is listening to a conversation, input audio tokens will be charged.
For output audio tokens, you're only charged when Gemini responds. If Gemini does not respond or stays silent, there will be no charge to your output audio tokens.
For more information, see Vertex AI pricing.
What's next
For more information on using the Live API, see: