Configure Gemini capabilities

This document shows you how to configure various capabilities of Gemini models when using the Live API. You can configure thinking, tool use such as function calling and grounding, and native audio capabilities such as affective dialog and proactive audio.

Configure thinking

Gemini models support thinking capabilities, with dynamic thinking enabled by default. The thinking_budget parameter guides the model on the number of thinking tokens to use. To disable thinking, set thinking_budget to 0.

config = {
    "response_modalities": ["audio"],
    "thinking_config": {
        "thinking_budget": 256,
    },
}

Configure tool use

Several tools are compatible with various versions of Live API-supported models, including:

To enable a particular tool for usage in returned responses, include the name of the tool in the tools list when you initialize the model. The following sections provide examples of how to use each of the built-in tools in your code.

Function calling

Use function calling to create a description of a function, then pass that description to the model in a request. The response from the model includes the name of a function that matches the description and the arguments to call it with.

All functions must be declared at the start of the session by sending tool definitions as part of the LiveConnectConfig message.

To enable function calling, include function_declarations in the tools list in the setup message:

Python

import asyncio
from google import genai
from google.genai import types

client = genai.Client(
    vertexai=True,
    project=GOOGLE_CLOUD_PROJECT,
    location=GOOGLE_CLOUD_LOCATION,
)
model = "gemini-live-2.5-flash"

# Simple function definitions
turn_on_the_lights = {"name": "turn_on_the_lights"}
turn_off_the_lights = {"name": "turn_off_the_lights"}

tools = [{"function_declarations": [turn_on_the_lights, turn_off_the_lights]}]
config = {"response_modalities": ["TEXT"], "tools": tools}

async def main():
    async with client.aio.live.connect(model=model, config=config) as session:
        prompt = "Turn on the lights please"
        await session.send_client_content(turns={"parts": [{"text": prompt}]})

        async for chunk in session.receive():
            if chunk.server_content:
                if chunk.text is not None:
                    print(chunk.text)
            elif chunk.tool_call:
                function_responses = []
                for fc in tool_call.function_calls:
                    function_response = types.FunctionResponse(
                        name=fc.name,
                        response={ "result": "ok" } # simple, hard-coded function response
                    )
                    function_responses.append(function_response)

                await session.send_tool_response(function_responses=function_responses)


if __name__ == "__main__":
    asyncio.run(main())
  

For examples using function calling in system instructions, see our best practices example.

You can use Grounding with Google Search with the Live API by including google_search in the tools list in the setup message:

Python

import asyncio
from google import genai
from google.genai import types

client = genai.Client(
    vertexai=True,
    project=GOOGLE_CLOUD_PROJECT,
    location=GOOGLE_CLOUD_LOCATION,
)
model = "gemini-live-2.5-flash"


tools = [{'google_search': {}}]
config = {"response_modalities": ["TEXT"], "tools": tools}

async def main():
    async with client.aio.live.connect(model=model, config=config) as session:
        prompt = "When did the last Brazil vs. Argentina soccer match happen?"
        await session.send_client_content(turns={"parts": [{"text": prompt}]})

        async for chunk in session.receive():
            if chunk.server_content:
                if chunk.text is not None:
                    print(chunk.text)

                # The model might generate and execute Python code to use Search
                model_turn = chunk.server_content.model_turn
                if model_turn:
                    for part in model_turn.parts:
                        if part.executable_code is not None:
                        print(part.executable_code.code)

                        if part.code_execution_result is not None:
                        print(part.code_execution_result.output)

if __name__ == "__main__":
    asyncio.run(main())
  

Grounding with Vertex AI RAG Engine

You can use Vertex AI RAG Engine with the Live API for grounding, storing, and retrieving contexts:

Python

from google import genai
from google.genai import types
from google.genai.types import (Content, LiveConnectConfig, HttpOptions, Modality, Part)
from IPython import display

PROJECT_ID=YOUR_PROJECT_ID
LOCATION=YOUR_LOCATION
TEXT_INPUT=YOUR_TEXT_INPUT
MODEL_NAME="gemini-live-2.5-flash"

client = genai.Client(
   vertexai=True,
   project=PROJECT_ID,
   location=LOCATION,
)

rag_store=types.VertexRagStore(
   rag_resources=[
       types.VertexRagStoreRagResource(
           rag_corpus=  # Use memory corpus if you want to store context.
       )
   ],
   # Set `store_context` to true to allow Live API sink context into your memory corpus.
   store_context=True
)

async with client.aio.live.connect(
   model=MODEL_NAME,
   config=LiveConnectConfig(response_modalities=[Modality.TEXT],
                            tools=[types.Tool(
                                retrieval=types.Retrieval(
                                    vertex_rag_store=rag_store))]),
) as session:
   text_input=TEXT_INPUT
   print("> ", text_input, "\n")
   await session.send_client_content(
       turns=Content(role="user", parts=[Part(text=text_input)])
   )

   async for message in session.receive():
       if message.text:
           display.display(display.Markdown(message.text))
           continue

For more information, see Use Vertex AI RAG Engine in Gemini Live API.

Configure native audio capabilities

Models that have native audio capabilities support the following features:

Configure Affective Dialog

When Affective Dialog is enabled, the model attempts to understand and respond based on the tone of voice and emotional expressions of the user.

To enable Affective Dialog, set enable_affective_dialog to true in the setup message:

Python

config = LiveConnectConfig(
    response_modalities=["AUDIO"],
    enable_affective_dialog=True,
)
  

Configure Proactive Audio

Proactive Audio lets you control when the model responds. For example, you can ask Gemini to only respond when prompted or when specific topics are discussed. To see a video demonstration of Proactive Audio, see Gemini LiveAPI Native Audio Preview.

To enable Proactive Audio, configure the proactivity field in the setup message and set proactive_audio to true:

Python

config = LiveConnectConfig(
    response_modalities=["AUDIO"],
    proactivity=ProactivityConfig(proactive_audio=True),
)
  

Example conversation

The following is a sample of what a conversation with Gemini about cooking might look like:

Prompt: "You are an AI assistant in Italian cooking; only chime in when the topic is about Italian cooking."

Speaker A: "I really love cooking!" (No response from Gemini.)

Speaker B: "Oh yes, me too! My favorite is French cuisine." (No response from
Gemini.)

Speaker A: "I really like Italian food; do you know how to make a pizza?"

(Italian cooking topic will trigger response from Gemini.)
Live API: "I'd be happy to help! Here's a recipe for a pizza."

Common use cases

When using Proactive Audio, Gemini performs as follows:

  • Responds with minimal latency: Gemini responds after the user is done speaking, reducing interruptions and helping Gemini not lose context if an interruption happens.
  • Avoids interruptions: Proactive Audio helps Gemini avoid interruptions from background noise or external chatter, and prevents Gemini from responding if external chatter is introduced during a conversation.
  • Handles interruptions: If the user needs to interrupt during a response from Gemini, Proactive Audio makes it easier for Gemini to appropriately back-channel (meaning appropriate interruptions are handled), rather than if a user uses filler words such as umm or uhh.
  • Co-listens to audio: Gemini can co-listen to an audio file that's not the speaker's voice and subsequently answer questions about that audio file later in the conversation.

Billing

While Gemini is listening to a conversation, input audio tokens will be charged.

For output audio tokens, you're only charged when Gemini responds. If Gemini does not respond or stays silent, there will be no charge to your output audio tokens.

For more information, see Vertex AI pricing.

What's next

For more information on using the Live API, see: