建構即時語音代理程式時,部分函式呼叫可能會封鎖模型執行作業,導致音訊串流靜音,使用者只能在無聲狀態下等待。使用 Gemini Live API 時,所有函式呼叫預設都是非封鎖,因此您可以執行函式,與主要對話流程並行。這項程序稱為「非同步函式呼叫」。後端可以在背景處理耗用大量資源的工作,例如搜尋即時航班價格或查詢複雜的外部 API,而模型會繼續聆聽、說話,並與使用者自然對話。Gemini Live API 可在背景處理函式呼叫,不會中斷使用者與模型的互動,讓互動更流暢即時。
非同步函式呼叫功能可讓您完成預約、設定提醒或擷取資料等工作,不必暫停對話。舉例來說,使用者可以要求預訂航班,並在系統於背景處理預訂作業時,立即詢問天氣資訊。
非同步函式呼叫範例
這個範例說明使用者預訂航班,並要求提供紐約時間,而 book_ticket 函式則在背景中非同步執行:
User: Please book the 2:00 PM flight to New York for me.
Model: function_call: {name: "book_ticket"}
//(The "book_ticket" function call is sent to the client.)
//(Right after the "book_ticket" function call is received, the client sends a text message to the model: "repeat this sentence 'I'm booking your ticket now, please wait.'")
//(The client runs the function call asynchronously in the background.)
Model: I'm booking your ticket now, please wait.
User: What is the current time in New York?
Model: The current time in New York is 12:00pm.
//(Once the book_ticket function finishes, the client sends the result.)
Function_response: {name: "book_ticket", response: {booking_status: "booked"}}
Model: Your flight has been booked. Expect a confirmation text on your phone within 5 minutes.
實作非同步函式呼叫
本節提供一系列範例,說明如何使用 Agent Platform SDK 的 Python 版本,建構高回應速度的並行架構,並運用 Gemini Live API 的非同步函式呼叫功能。這些範例分為下列工作:
定義工具
非同步函式呼叫功能是在模型層級啟用,因此您可以在要求設定中指定要使用的工具,就像在 Gemini Enterprise Agent Platform 呼叫中,對任何標準 Gemini API 執行這項操作一樣。這樣一來,模型就能在工具執行時繼續對話:
from google import genai
from google.genai import types
# 1. A tool that takes a long time to execute
search_live_flights = {
"name": "search_live_flights",
"description": "Searches airlines for current flight prices. Can take up to 10 seconds."
}
# 2. A tool that executes instantly
get_current_weather = {
"name": "get_current_weather",
"description": "Gets the current weather for a given city."
}
tools = [{"function_declarations": [search_live_flights, get_current_weather]}]
處理訊息串流中的函式呼叫
當模型應呼叫一或多個函式時,Gemini Live API 會透過即時訊息串流傳送 tool_call 事件。
後端不得封鎖串流,因為模型預期會持續執行。收到慢速函式 (例如 search_live_flights) 的呼叫時,您必須將其傳遞至背景工作者。如果您在 10 秒任務的主要訊息迴圈中直接使用 await,連線就會凍結。可以安全地等待快速工作 (例如 get_current_weather)。
import asyncio
async def handle_stream(session):
async for response in session.receive():
# Check if the model is asking to use a tool
if response.tool_call is not None:
for fc in response.tool_call.function_calls:
if fc.name == "search_live_flights":
# Pass to a background task so we don't block the receive loop!
asyncio.create_task(background_flight_search(fc.id, fc.args, session))
elif fc.name == "get_current_weather":
# Instant lookups can be safely awaited directly
await instant_weather_lookup(fc.id, fc.args, session)
管理使用者期望
為管理長時間執行的非同步函式呼叫期間的預期情況,建議用戶端發起簡訊。這則訊息應提示系統通知使用者要求正在處理中,並請他們耐心等候。舉例來說,用戶端收到函式呼叫後,可以傳送文字訊息給模型,例如:「repeat this sentence: 'I'm booking your ticket now, please wait.'」(重複這句話:「我現在正在為你訂票,請稍候。」)。
以下範例對話方塊顯示這項交換作業:
User: Please book the 2:00 PM flight to New York for me.
Model: function_call: {name: "book_ticket"}
//(The "book_ticket" function call is sent to the client.)
//(Right after the "book_ticket" function call is received, the client sends a text message to the model: "repeat this sentence 'I'm booking your ticket now, please wait.'")
//(The client runs the function call asynchronously in the background.)
Model: I'm booking your ticket now, please wait.
User: What is the current time in New York?
Model: The current time in New York is 12:00pm.
//(Once the "book_ticket" function call finishes, the client sends in the response.)
Function_response: {name: "book_ticket", response: {booking_status: "booked"}}
Model: Your flight has been booked. Expect a confirmation text on your phone within 5 minutes.
這種主動傳訊策略有以下優點:
- 向使用者說明目前的系統作業,以便在長時間執行的函式呼叫期間管理預期行為。
- 減少重複的簡短使用者提示,例如「你好?」或「你在嗎?」。這類情況通常發生在系統長時間處於靜止狀態,同時處理非同步函式呼叫時。這有助於盡量避免因使用者重複查詢而觸發重複的函式呼叫。
- 提供額外的系統提示,可降低後續互動中建立重複通話的機率。
處理重複的函式呼叫
模型在收到第一次呼叫的回應前,可能會重複呼叫函式。如果您的用途允許,應用程式可以忽略重複的函式呼叫,前提是相同函式呼叫的回應仍在等待中。
以下範例說明用戶端如何忽略重複的函式呼叫:
User: Please book the 2:00 PM flight to New York for me.
Model: function_call: {name: "book_ticket"}
//(The "book_ticket" function call is sent to the client. It is running asynchronously in the background.)
User: What is the current time in New York?
Model: The current time in New York is 12:00pm. + function_call: {name: "book_ticket"}
//(The duplicated "book_ticket" can be ignored by the client since the response for the first "book_ticket" has not been sent to the model yet.)
//(The first "book_ticket" function call finishes, and client sends in the response.)
Function_response: {name: "book_ticket", response: {booking_status: "booked"}}
Model: Your flight has been booked. Expect a confirmation text on your phone within 5 minutes.
處理非同步函式回應
非同步函式呼叫完成後,應用程式會透過 function_response 將結果傳送至模型。後端處理函式呼叫 (例如搜尋航班) 時,使用者可能會向模型提出完全不同的問題,例如「倫敦天氣如何?」。模型會即時回應要求,並同時執行函式呼叫。由於使用者可能在函式執行完成時與模型互動,您可以指定政策,定義模型應如何處理這項傳入的回應。您可以指定下列其中一項政策:
如要指定政策,請在 function_response 酬載中加入 scheduling 欄位:
{
"name": "book_ticket",
"scheduling": "WHEN_IDLE",
"response": {
"booking_status": "booked"
}
}
如果省略 scheduling 欄位,Gemini Live API 會使用原始方法處理函式回應,以確保回溯相容性。
以下 Python 範例說明如何格式化及傳送 function_response,並使用 scheduling="WHEN_IDLE" 在對話自然停頓時宣布結果:
aearcync def background_flight_search(call_id, args, session):
# 1. Simulate a slow API call taking 5 seconds
await asyncio.sleep(5)
flight_data = ["Air Canada AC758: $350", "WestJet WS12: $290"]
# 2. Format the response
function_response = types.FunctionResponse(
id=call_id,
name="search_live_flights",
response={ "status": "success", "flights": flight_data },
scheduling="WHEN_IDLE" # Wait for a moment to tell the user
)
# 3. Send it back into the live session
await session.send_tool_response(function_responses=[function_response])
您可以在 scheduling 欄位中指定下列政策,管理函式回應:
SILENT 回應政策
使用 SILENT 政策時,函式回應會新增至模型的脈絡,但模型不會為此生成回應,也不會中斷任何進行中的使用者互動。
User: Please book the 2:00 PM flight to New York for me.
Model: function_call: {name: "book_ticket"}
//(The book_ticket function call is sent to the client and starts running asynchronously in the background.)
User: What is the current time in New York?
Model: The current time in New York is 12:00pm.
//(The book_ticket function finishes, and client sends the result with scheduling: "SILENT".)
Function_response: {name: "book_ticket", scheduling: "SILENT", response: {booking_status: "booked"}}
//(The model doesn't generate a response for the function response.)
User: Is my flight ticket booked?
Model: Yes. Your flight has been booked.
WHEN_IDLE 回應政策
如果使用 WHEN_IDLE 政策,模型只會在沒有進行中的使用者互動時,產生函式回應的回覆。如果使用者正在互動,模型會等待互動完成再生成回覆,以免中斷互動。
User: Please book the 2:00 PM flight to New York for me.
Model: function_call: {name: "book_ticket"}
//(The book_ticket function call is sent to the client and starts running asynchronously in the background.)
User: What is the current time in New York?
//(The book_ticket function finishes, and client sends the result with scheduling: "WHEN_IDLE".)
Function_response: {name: "book_ticket", scheduling: "WHEN_IDLE", response: {booking_status: "booked"}}
//(The ongoing interaction about the time is not interrupted.)
Model: The current time in New York is 12:00pm.
//(After responding to the user's time query, the model issues the response for the book_ticket function.)
Model: Your flight has been booked. Expect a confirmation text on your phone within 5 minutes.
INTERRUPT 回應政策
使用 INTERRUPT 政策時,模型會立即產生函式回應的回覆,並中斷任何進行中的使用者互動。
User: Please book the 2:00 PM flight to New York for me.
Model: function_call: {name: "book_ticket"}
//(The book_ticket function call is sent to the client and starts running asynchronously in the background.)
User: What is the current time in New York?
//(The book_ticket function finishes, and client sends the result with scheduling: "INTERRUPT".)
Function_response: {name: "book_ticket", scheduling: "INTERRUPT", response: {booking_status: "booked"}}
//(The ongoing interaction about the time is interrupted, and model skips responding to it.)
Model: Your flight has been booked. Expect a confirmation text on your phone within 5 minutes.
最佳做法
- 設計並行作業:一律將緩慢的工具 (例如查詢外部 API 或執行 RAG 管道) 卸載至後端的背景工作。讓模型繼續處理主動音訊串流。
- 除非必要,否則請避免使用 INTERRUPT:針對重大快訊使用
INTERRUPT。 對於背景工作,SILENT或WHEN_IDLE可提供更流暢、更友善的使用者體驗。 - 獨立的對話回合:在 Gemini Live API 中,工具執行作業完全獨立於對話回合。在工具於背景處理時,對話可以分支、繼續,並自然流暢地進行。
- 「無聲」注意事項:即使排定為
SILENT,模型有時仍可能會嘗試口頭敘述工具的執行情況。如要強制執行真正的靜音,請在系統指令中加入明確的防護措施 (例如「使用 [工具名稱] 時,請執行無聲執行作業,且不發出任何語音」),或使用「fire-and-forget」後端模式,完全不將FunctionResponse傳回模型。