使用 Gemini Live API 進行非同步函式呼叫

建構即時語音代理程式時,部分函式呼叫可能會封鎖模型執行作業,導致音訊串流靜音,使用者只能在無聲狀態下等待。使用 Gemini Live API 時,所有函式呼叫預設都是非封鎖,因此您可以執行函式,與主要對話流程並行。這項程序稱為「非同步函式呼叫」。後端可以在背景處理耗用大量資源的工作,例如搜尋即時航班價格或查詢複雜的外部 API,而模型會繼續聆聽、說話,並與使用者自然對話。Gemini Live API 可在背景處理函式呼叫,不會中斷使用者與模型的互動,讓互動更流暢即時。

非同步函式呼叫功能可讓您完成預約、設定提醒或擷取資料等工作,不必暫停對話。舉例來說,使用者可以要求預訂航班,並在系統於背景處理預訂作業時,立即詢問天氣資訊。

非同步函式呼叫範例

這個範例說明使用者預訂航班,並要求提供紐約時間,而 book_ticket 函式則在背景中非同步執行:

User: Please book the 2:00 PM flight to New York for me.

Model: function_call: {name: "book_ticket"}
//(The "book_ticket" function call is sent to the client.)
//(Right after the "book_ticket" function call is received, the client sends a text message to the model: "repeat this sentence 'I'm booking your ticket now, please wait.'")
//(The client runs the function call asynchronously in the background.)
Model: I'm booking your ticket now, please wait.

User: What is the current time in New York?

Model: The current time in New York is 12:00pm.

//(Once the book_ticket function finishes, the client sends the result.)
Function_response: {name: "book_ticket", response: {booking_status: "booked"}}

Model: Your flight has been booked. Expect a confirmation text on your phone within 5 minutes.

實作非同步函式呼叫

本節提供一系列範例,說明如何使用 Agent Platform SDK 的 Python 版本,建構高回應速度的並行架構,並運用 Gemini Live API 的非同步函式呼叫功能。這些範例分為下列工作:

定義工具

非同步函式呼叫功能是在模型層級啟用,因此您可以在要求設定中指定要使用的工具,就像在 Gemini Enterprise Agent Platform 呼叫中,對任何標準 Gemini API 執行這項操作一樣。這樣一來,模型就能在工具執行時繼續對話:

from google import genai
from google.genai import types

# 1. A tool that takes a long time to execute
search_live_flights = {
    "name": "search_live_flights",
    "description": "Searches airlines for current flight prices. Can take up to 10 seconds."
}

# 2. A tool that executes instantly
get_current_weather = {
    "name": "get_current_weather",
    "description": "Gets the current weather for a given city."
}

tools = [{"function_declarations": [search_live_flights, get_current_weather]}]

處理訊息串流中的函式呼叫

當模型應呼叫一或多個函式時,Gemini Live API 會透過即時訊息串流傳送 tool_call 事件。

後端不得封鎖串流,因為模型預期會持續執行。收到慢速函式 (例如 search_live_flights) 的呼叫時,您必須將其傳遞至背景工作者。如果您在 10 秒任務的主要訊息迴圈中直接使用 await,連線就會凍結。可以安全地等待快速工作 (例如 get_current_weather)。

import asyncio

async def handle_stream(session):
    async for response in session.receive():
        # Check if the model is asking to use a tool
        if response.tool_call is not None:
            for fc in response.tool_call.function_calls:

                if fc.name == "search_live_flights":
                    # Pass to a background task so we don't block the receive loop!
                    asyncio.create_task(background_flight_search(fc.id, fc.args, session))

                elif fc.name == "get_current_weather":
                    # Instant lookups can be safely awaited directly
                    await instant_weather_lookup(fc.id, fc.args, session)

管理使用者期望

為管理長時間執行的非同步函式呼叫期間的預期情況,建議用戶端發起簡訊。這則訊息應提示系統通知使用者要求正在處理中,並請他們耐心等候。舉例來說,用戶端收到函式呼叫後,可以傳送文字訊息給模型,例如:「repeat this sentence: 'I'm booking your ticket now, please wait.'」(重複這句話:「我現在正在為你訂票,請稍候。」)。

以下範例對話方塊顯示這項交換作業:

User: Please book the 2:00 PM flight to New York for me.
Model: function_call: {name: "book_ticket"}
//(The "book_ticket" function call is sent to the client.)
//(Right after the "book_ticket" function call is received, the client sends a text message to the model: "repeat this sentence 'I'm booking your ticket now, please wait.'")
//(The client runs the function call asynchronously in the background.)
Model: I'm booking your ticket now, please wait.
User: What is the current time in New York?
Model: The current time in New York is 12:00pm.
//(Once the "book_ticket" function call finishes, the client sends in the response.)
Function_response: {name: "book_ticket", response: {booking_status: "booked"}}
Model: Your flight has been booked. Expect a confirmation text on your phone within 5 minutes.

這種主動傳訊策略有以下優點:

  • 向使用者說明目前的系統作業,以便在長時間執行的函式呼叫期間管理預期行為。
  • 減少重複的簡短使用者提示,例如「你好?」或「你在嗎?」。這類情況通常發生在系統長時間處於靜止狀態,同時處理非同步函式呼叫時。這有助於盡量避免因使用者重複查詢而觸發重複的函式呼叫。
  • 提供額外的系統提示,可降低後續互動中建立重複通話的機率。

處理重複的函式呼叫

模型在收到第一次呼叫的回應前,可能會重複呼叫函式。如果您的用途允許,應用程式可以忽略重複的函式呼叫,前提是相同函式呼叫的回應仍在等待中。

以下範例說明用戶端如何忽略重複的函式呼叫:

User: Please book the 2:00 PM flight to New York for me.

Model: function_call: {name: "book_ticket"}
//(The "book_ticket" function call is sent to the client. It is running asynchronously in the background.)

User: What is the current time in New York?
Model: The current time in New York is 12:00pm. + function_call: {name: "book_ticket"}
//(The duplicated "book_ticket" can be ignored by the client since the response for the first "book_ticket" has not been sent to the model yet.)

//(The first "book_ticket" function call finishes, and client sends in the response.)
Function_response: {name: "book_ticket", response: {booking_status: "booked"}}

Model: Your flight has been booked. Expect a confirmation text on your phone within 5 minutes.

處理非同步函式回應

非同步函式呼叫完成後,應用程式會透過 function_response 將結果傳送至模型。後端處理函式呼叫 (例如搜尋航班) 時,使用者可能會向模型提出完全不同的問題,例如「倫敦天氣如何?」。模型會即時回應要求,並同時執行函式呼叫。由於使用者可能在函式執行完成時與模型互動,您可以指定政策,定義模型應如何處理這項傳入的回應。您可以指定下列其中一項政策:

如要指定政策,請在 function_response 酬載中加入 scheduling 欄位:

{
  "name": "book_ticket",
  "scheduling": "WHEN_IDLE",
  "response": {
    "booking_status": "booked"
  }
}

如果省略 scheduling 欄位,Gemini Live API 會使用原始方法處理函式回應,以確保回溯相容性。

以下 Python 範例說明如何格式化及傳送 function_response,並使用 scheduling="WHEN_IDLE" 在對話自然停頓時宣布結果:

aearcync def background_flight_search(call_id, args, session):
    # 1. Simulate a slow API call taking 5 seconds
    await asyncio.sleep(5)
    flight_data = ["Air Canada AC758: $350", "WestJet WS12: $290"]

    # 2. Format the response
    function_response = types.FunctionResponse(
        id=call_id,
        name="search_live_flights",
        response={ "status": "success", "flights": flight_data },
        scheduling="WHEN_IDLE" # Wait for a moment to tell the user
    )

    # 3. Send it back into the live session
    await session.send_tool_response(function_responses=[function_response])

您可以在 scheduling 欄位中指定下列政策,管理函式回應:

SILENT 回應政策

使用 SILENT 政策時,函式回應會新增至模型的脈絡,但模型不會為此生成回應,也不會中斷任何進行中的使用者互動。

User: Please book the 2:00 PM flight to New York for me.

Model: function_call: {name: "book_ticket"}
//(The book_ticket function call is sent to the client and starts running asynchronously in the background.)

User: What is the current time in New York?
Model: The current time in New York is 12:00pm.

//(The book_ticket function finishes, and client sends the result with scheduling: "SILENT".)
Function_response: {name: "book_ticket", scheduling: "SILENT", response: {booking_status: "booked"}}
//(The model doesn't generate a response for the function response.)

User: Is my flight ticket booked?
Model: Yes. Your flight has been booked.

WHEN_IDLE 回應政策

如果使用 WHEN_IDLE 政策,模型只會在沒有進行中的使用者互動時,產生函式回應的回覆。如果使用者正在互動,模型會等待互動完成再生成回覆,以免中斷互動。

User: Please book the 2:00 PM flight to New York for me.

Model: function_call: {name: "book_ticket"}
//(The book_ticket function call is sent to the client and starts running asynchronously in the background.)

User: What is the current time in New York?

//(The book_ticket function finishes, and client sends the result with scheduling: "WHEN_IDLE".)
Function_response: {name: "book_ticket", scheduling: "WHEN_IDLE", response: {booking_status: "booked"}}
//(The ongoing interaction about the time is not interrupted.)

Model: The current time in New York is 12:00pm.
//(After responding to the user's time query, the model issues the response for the book_ticket function.)
Model: Your flight has been booked. Expect a confirmation text on your phone within 5 minutes.

INTERRUPT 回應政策

使用 INTERRUPT 政策時,模型會立即產生函式回應的回覆,並中斷任何進行中的使用者互動。

User: Please book the 2:00 PM flight to New York for me.

Model: function_call: {name: "book_ticket"}
//(The book_ticket function call is sent to the client and starts running asynchronously in the background.)

User: What is the current time in New York?

//(The book_ticket function finishes, and client sends the result with scheduling: "INTERRUPT".)
Function_response: {name: "book_ticket", scheduling: "INTERRUPT", response: {booking_status: "booked"}}
//(The ongoing interaction about the time is interrupted, and model skips responding to it.)

Model: Your flight has been booked. Expect a confirmation text on your phone within 5 minutes.

最佳做法

  • 設計並行作業:一律將緩慢的工具 (例如查詢外部 API 或執行 RAG 管道) 卸載至後端的背景工作。讓模型繼續處理主動音訊串流。
  • 除非必要,否則請避免使用 INTERRUPT:針對重大快訊使用 INTERRUPT。 對於背景工作,SILENTWHEN_IDLE 可提供更流暢、更友善的使用者體驗。
  • 獨立的對話回合:在 Gemini Live API 中,工具執行作業完全獨立於對話回合。在工具於背景處理時,對話可以分支、繼續,並自然流暢地進行。
  • 「無聲」注意事項:即使排定為 SILENT,模型有時仍可能會嘗試口頭敘述工具的執行情況。如要強制執行真正的靜音,請在系統指令中加入明確的防護措施 (例如「使用 [工具名稱] 時,請執行無聲執行作業,且不發出任何語音」),或使用「fire-and-forget」後端模式,完全不將 FunctionResponse 傳回模型。

後續步驟

總覽

查看 Live API 總覽。

參考資料

Live API 參考指南。

指南

瞭解如何使用 Live API 開始及管理即時活動。

指南

瞭解如何為 Live API 設定 Gemini 功能。