When building real-time voice agents, some function calls can block the model's execution, which causes the audio stream to go quiet and the user to be left waiting in silence. With Gemini Live API, all function calls are non-blocking by default, which lets you execute functions in parallel with the main conversation flow. This process is called asynchronous function calling. Your backend can process heavy tasks like searching live flight prices or querying complex external APIs in the background, while the model continues to listen, speak, and converse naturally with the user. Gemini Live API lets function calls process in the background without interrupting the user's interaction with the model, enabling more fluid and real-time interactions.
Asynchronous function calling lets you complete tasks like booking appointments, setting reminders, or fetching data without pausing the conversation. For example, a user can request to book a flight and immediately ask for weather information while the booking is processed in the background.
Example of asynchronous function calling
This example demonstrates a user booking a flight and asking for the time in
New York while the book_ticket function runs asynchronously in the background:
User: Please book the 2:00 PM flight to New York for me.
Model: function_call: {name: "book_ticket"}
//(The "book_ticket" function call is sent to the client.)
//(Right after the "book_ticket" function call is received, the client sends a text message to the model: "repeat this sentence 'I'm booking your ticket now, please wait.'")
//(The client runs the function call asynchronously in the background.)
Model: I'm booking your ticket now, please wait.
User: What is the current time in New York?
Model: The current time in New York is 12:00pm.
//(Once the book_ticket function finishes, the client sends the result.)
Function_response: {name: "book_ticket", response: {booking_status: "booked"}}
Model: Your flight has been booked. Expect a confirmation text on your phone within 5 minutes.
Implement asynchronous function calling
This section provides a series of examples using the Python version of the Agent Platform SDK to build a highly responsive, concurrent architecture that uses Gemini Live API's asynchronous function calling capability. The examples are broken down into the following tasks:
Define your tools
Asynchronous function calling is enabled at the model level, so you can specify what tools you want to use in the request configuration like you would for any standard Gemini API in Gemini Enterprise Agent Platform call. This allows the model to continue conversing while your tool executes:
from google import genai
from google.genai import types
# 1. A tool that takes a long time to execute
search_live_flights = {
"name": "search_live_flights",
"description": "Searches airlines for current flight prices. Can take up to 10 seconds."
}
# 2. A tool that executes instantly
get_current_weather = {
"name": "get_current_weather",
"description": "Gets the current weather for a given city."
}
tools = [{"function_declarations": [search_live_flights, get_current_weather]}]
Handle function calls from the message stream
When one or more functions should be called by the model,
Gemini Live API sends a tool_call event through the real-time message stream.
Your backend must not block the stream because the model expects to keep
running. When you receive a call for a slow function (like
search_live_flights), you must pass it to a background worker. If you use an
await directly in your main message loop for a 10-second task, you will
freeze the connection. Fast tasks (like get_current_weather)
can be safely awaited.
import asyncio
async def handle_stream(session):
async for response in session.receive():
# Check if the model is asking to use a tool
if response.tool_call is not None:
for fc in response.tool_call.function_calls:
if fc.name == "search_live_flights":
# Pass to a background task so we don't block the receive loop!
asyncio.create_task(background_flight_search(fc.id, fc.args, session))
elif fc.name == "get_current_weather":
# Instant lookups can be safely awaited directly
await instant_weather_lookup(fc.id, fc.args, session)
Manage user expectations
To manage expectations during long-running asynchronous function calls, it's recommended that the client initiate a text message. This message should prompt the system to inform the user that the request is being processed and request their patience. For example, after a function call is received by the client, the client can send a text message to the model such as: "repeat this sentence: 'I'm booking your ticket now, please wait.'".
The following example dialog shows this exchange:
User: Please book the 2:00 PM flight to New York for me.
Model: function_call: {name: "book_ticket"}
//(The "book_ticket" function call is sent to the client.)
//(Right after the "book_ticket" function call is received, the client sends a text message to the model: "repeat this sentence 'I'm booking your ticket now, please wait.'")
//(The client runs the function call asynchronously in the background.)
Model: I'm booking your ticket now, please wait.
User: What is the current time in New York?
Model: The current time in New York is 12:00pm.
//(Once the "book_ticket" function call finishes, the client sends in the response.)
Function_response: {name: "book_ticket", response: {booking_status: "booked"}}
Model: Your flight has been booked. Expect a confirmation text on your phone within 5 minutes.
This proactive messaging strategy has the following benefits:
- Informs the user of current system operations, which manages expectations during long-running function calls.
- Reduces the frequency of redundant short user prompts, such as "hello?" or "are you there?". These often occur during long periods of system silence while asynchronous function calls are being processed. This can minimize the risk of duplicated function calls triggered by these repeated user inquiries.
- Providing an extra system prompt can lower the probability of creating duplicate calls in subsequent interactions.
Handle duplicate function calls
There is a small chance that the model generates duplicated function calls before receiving a response for the first call. If your use case allows it, your application can ignore duplicate function calls if a response for the same function call is still pending.
The following example shows how a client can ignore a duplicate function call:
User: Please book the 2:00 PM flight to New York for me.
Model: function_call: {name: "book_ticket"}
//(The "book_ticket" function call is sent to the client. It is running asynchronously in the background.)
User: What is the current time in New York?
Model: The current time in New York is 12:00pm. + function_call: {name: "book_ticket"}
//(The duplicated "book_ticket" can be ignored by the client since the response for the first "book_ticket" has not been sent to the model yet.)
//(The first "book_ticket" function call finishes, and client sends in the response.)
Function_response: {name: "book_ticket", response: {booking_status: "booked"}}
Model: Your flight has been booked. Expect a confirmation text on your phone within 5 minutes.
Handle asynchronous function responses
When an asynchronous function call is complete, your application sends the
result to the model in a function_response. While your backend is processing
a function call, like searching for flights, the user might ask the model a
completely different question, such as, "What's the weather like in
London?". The model will respond to the request in real-time, in parallel with the function
call execution. As the user might be in an ongoing interaction with the model
when function execution completes, you can specify a policy that defines how
the model should handle this incoming response. You can specify one of the
following policies:
To specify a policy, include the scheduling field in your
function_response payload:
{
"name": "book_ticket",
"scheduling": "WHEN_IDLE",
"response": {
"booking_status": "booked"
}
}
If you omit the scheduling field, Gemini Live API uses its original method
for handling function responses for backward compatibility.
The following Python example shows how to format and send a function_response
with scheduling="WHEN_IDLE" to wait for a natural pause in conversation
before announcing results:
aearcync def background_flight_search(call_id, args, session):
# 1. Simulate a slow API call taking 5 seconds
await asyncio.sleep(5)
flight_data = ["Air Canada AC758: $350", "WestJet WS12: $290"]
# 2. Format the response
function_response = types.FunctionResponse(
id=call_id,
name="search_live_flights",
response={ "status": "success", "flights": flight_data },
scheduling="WHEN_IDLE" # Wait for a moment to tell the user
)
# 3. Send it back into the live session
await session.send_tool_response(function_responses=[function_response])
The following policies can be specified in the scheduling field for managing
function responses:
SILENT response policy
With the SILENT policy, the function response is added to the model's context,
but the model doesn't generate a response for it, and any ongoing user
interaction is not interrupted.
User: Please book the 2:00 PM flight to New York for me.
Model: function_call: {name: "book_ticket"}
//(The book_ticket function call is sent to the client and starts running asynchronously in the background.)
User: What is the current time in New York?
Model: The current time in New York is 12:00pm.
//(The book_ticket function finishes, and client sends the result with scheduling: "SILENT".)
Function_response: {name: "book_ticket", scheduling: "SILENT", response: {booking_status: "booked"}}
//(The model doesn't generate a response for the function response.)
User: Is my flight ticket booked?
Model: Yes. Your flight has been booked.
WHEN_IDLE response policy
With the WHEN_IDLE policy, the model generates a response to the function
response only when there is no active user interaction. If a user interaction
is in progress, the model waits for it to complete before generating a response
to avoid interruption.
User: Please book the 2:00 PM flight to New York for me.
Model: function_call: {name: "book_ticket"}
//(The book_ticket function call is sent to the client and starts running asynchronously in the background.)
User: What is the current time in New York?
//(The book_ticket function finishes, and client sends the result with scheduling: "WHEN_IDLE".)
Function_response: {name: "book_ticket", scheduling: "WHEN_IDLE", response: {booking_status: "booked"}}
//(The ongoing interaction about the time is not interrupted.)
Model: The current time in New York is 12:00pm.
//(After responding to the user's time query, the model issues the response for the book_ticket function.)
Model: Your flight has been booked. Expect a confirmation text on your phone within 5 minutes.
INTERRUPT response policy
With the INTERRUPT policy, the model generates a response to the function
response immediately, interrupting any ongoing user interaction.
User: Please book the 2:00 PM flight to New York for me.
Model: function_call: {name: "book_ticket"}
//(The book_ticket function call is sent to the client and starts running asynchronously in the background.)
User: What is the current time in New York?
//(The book_ticket function finishes, and client sends the result with scheduling: "INTERRUPT".)
Function_response: {name: "book_ticket", scheduling: "INTERRUPT", response: {booking_status: "booked"}}
//(The ongoing interaction about the time is interrupted, and model skips responding to it.)
Model: Your flight has been booked. Expect a confirmation text on your phone within 5 minutes.
Best practices
- Design for concurrency: Always offload slow tools (like querying external APIs or running RAG pipelines) to background tasks in your backend. Let the model keep handling the active audio stream.
- Avoid INTERRUPT unless necessary: Use
INTERRUPTfor critical alerts. For background tasks,SILENTorWHEN_IDLEprovides a much smoother, polite user experience. - Independent chat turns: In Gemini Live API, tool execution is completely independent of chat turns. The conversation can branch, continue, and flow naturally while your tool processes in the background.
- The "Silent" caveat: The model may still occasionally try to
verbally narrate a tool's execution even if scheduled as
SILENT. To enforce true silence, add explicit guardrails to your System Instructions (for example, "When using [Tool Name], perform a SILENT EXECUTION and say nothing"), or use a "fire-and-forget" backend pattern where you don't send aFunctionResponseback to the model at all.