本頁面由 Cloud Translation API 翻譯而成。

電腦使用模型和工具

預覽

這項產品適用《服務專屬條款》中「一般服務條款」一節的《正式發布前產品條款》，以及《生成式 AI 搶先體驗產品附加條款》。本產品為「Agentic AI 服務」，使用時須遵守《服務專屬條款》中適用於 Agentic AI 服務的所有條款，以及《生成式 AI 使用限制政策》。「客戶」同意不會自動略過或規避任何需要使用者確認的安全回應。

正式發布前的產品和功能是按照「原樣」提供，支援範圍可能有限。Gemini 2.5 Computer Use 模型和工具可能容易發生錯誤和安全漏洞。 Gemini 2.5 電腦使用模型和工具建議的動作可能不適當或不安全；此外，如果輸入內容含有惡意成分，建議的動作也可能有害。客戶在執行重要工作時應密切監督，且不應使用 Gemini 2.5 Computer Use 模型和工具處理涉及重大決策、機密資料或嚴重錯誤無法修正的工作。建議您詳閱本文列出的安全最佳做法。

Gemini 2.5 Computer Use 模型和工具可讓應用程式在瀏覽器中互動及自動執行工作。電腦使用模型可以根據螢幕截圖推斷電腦螢幕的相關資訊，並產生滑鼠點擊和鍵盤輸入等特定 UI 動作，進而執行動作。與函式呼叫類似，您需要編寫用戶端應用程式程式碼，才能接收 Computer Use 模型和工具函式呼叫，並執行對應的動作。

您可以使用電腦使用模型和工具，建構可執行下列動作的代理程式：

自動在網站上輸入重複資料或填寫表單。
瀏覽網站以收集資訊。
在網頁應用程式中執行一連串動作，協助使用者。

本指南涵蓋下列主題：

電腦使用模型和工具的運作方式
如何啟用「電腦使用」模型和工具
如何傳送要求、接收回應及建構代理程式迴圈
支援哪些電腦動作
安全與防護支援
預覽價格

本指南假設您使用 Python 適用的 Gen AI SDK，且熟悉 Playwright API。

在預先發布期間，其他 SDK 語言或 Google Cloud 控制台不支援電腦使用模型和工具。

此外，您也可以在 GitHub 中查看電腦使用模型和工具的參考實作方式。

電腦使用模型和工具的運作方式

電腦使用模型和工具不會生成文字回應，而是判斷何時要執行特定 UI 動作 (例如滑鼠點擊)，並傳回執行這些動作所需的參數。您需要編寫用戶端應用程式程式碼，才能接收電腦使用模型和工具 function_call，並執行相應動作。

電腦使用模型和工具互動會遵循代理迴圈程序：

向模型傳送要求
- 將 Computer Use 模型和工具，以及任何其他工具 (選用) 新增至 API 要求。
- 根據使用者的要求和代表目前 GUI 狀態的螢幕截圖，提示電腦使用模型和工具。
接收模型回應
- 模型會分析使用者要求和螢幕截圖，然後產生回應，其中包含建議的 function_call，代表 UI 動作 (例如「點選座標 (x,y)」或「輸入『文字』」)。如要查看模型支援的所有動作，請參閱「支援的動作」。
- API 回應也可能包含內部安全系統的 safety_response，該系統已檢查模型建議的動作。這會將動作分類為：
  - 正常或允許：系統會將這類動作視為安全。這也可能以沒有 safety_response 的形式呈現。
  - 需要確認：模型即將執行可能具有風險的動作 (例如點選「接受 Cookie 橫幅」)。
執行收到的動作
- 您的用戶端程式碼會收到 function_call 和任何隨附的 safety_response。
- 如果 safety_response 指出為一般或允許 (或沒有 safety_response)，用戶端程式碼就能在目標環境 (例如網頁瀏覽器) 中執行指定的 function_call。
- 如果 safety_response 指出需要確認，應用程式必須先提示使用者確認，才能執行 function_call。如果使用者確認，請繼續執行動作。如果使用者拒絕，請勿執行動作。
擷取新環境狀態
- 如果動作已執行，用戶端會擷取 GUI 和目前網址的新螢幕截圖，並做為 function_response 的一部分傳送回電腦使用模型和工具。
- 如果安全系統封鎖某項動作，或使用者拒絕確認，應用程式可能會將不同形式的回饋傳送給模型，或終止互動。

系統會將更新後的狀態傳送給模型。系統會從步驟 2 開始重複執行程序，並使用電腦使用模型和工具，根據新螢幕截圖 (如有) 和持續進行的目標，建議下一個動作。這個迴圈會持續運作，直到工作完成、發生錯誤或程序終止為止 (例如，如果回應遭到安全篩選器或使用者決策封鎖)。

下圖說明電腦使用模型和工具的運作方式：

電腦使用模型和工具總覽

啟用「電腦使用」模型和工具

如要啟用「電腦使用」模型和工具，請使用 gemini-2.5-computer-use-preview-10-2025 做為模型，並將「電腦使用」模型和工具新增至已啟用工具的清單：

Python

from google import genai
from google.genai import types
from google.genai.types import Content, Part, FunctionResponse

client = genai.Client()

# Add Computer Use model and tool to the list of tools
generate_content_config = genai.types.GenerateContentConfig(
    tools=[
        types.Tool(
            computer_use=types.ComputerUse(
                environment=types.Environment.ENVIRONMENT_BROWSER,
                )
              ),
            ]
          )

# Example request using the Computer Use model and tool
contents = [
    Content(
        role="user",
        parts=[
            Part(text="Go to google.com and search for 'weather in New York'"),
          ],
        )
      ]

response = client.models.generate_content(
    model="gemini-2.5-computer-use-preview-10-2025",
    contents=contents,
    config=generate_content_config,
  )

傳送要求

設定電腦使用模型和工具後，請將提示傳送至模型，其中包含使用者的目標和 GUI 的初始螢幕截圖。

您也可以視需要新增以下內容：

排除的動作：如果清單中有任何支援的 UI 動作，是您不希望模型執行的，請在 excluded_predefined_functions 中指定這些動作。
使用者定義函式：除了電腦使用模型和工具，您可能還想加入自訂使用者定義函式。

下列程式碼範例會啟用電腦使用模型和工具，並將要求傳送至模型：

Python

from google import genai
from google.genai import types
from google.genai.types import Content, Part

client = genai.Client()

# Specify predefined functions to exclude (optional)
excluded_functions = ["drag_and_drop"]

# Configuration for the Computer Use model and tool with browser environment
generate_content_config = genai.types.GenerateContentConfig(
    tools=[
        # 1. Computer Use model and tool with browser environment
        types.Tool(
            computer_use=types.ComputerUse(
                environment=types.Environment.ENVIRONMENT_BROWSER,
                # Optional: Exclude specific predefined functions
                excluded_predefined_functions=excluded_functions
                )
              ),
        # 2. Optional: Custom user-defined functions (need to defined above)
        # types.Tool(
           # function_declarations=custom_functions
           #   )
    ],
)

# Create the content with user message
contents: list[Content] = [
    Content(
        role="user",
        parts=[
            Part(text="Search for highly rated smart fridges with touchscreen, 2 doors, around 25 cu ft, priced below 4000 dollars on Google Shopping. Create a bulleted list of the 3 cheapest options in the format of name, description, price in an easy-to-read layout."),
            # Optional: include a screenshot of the initial state
            # Part.from_bytes(
                 # data=screenshot_image_bytes,
                 # mime_type='image/png',
            # ),
        ],
    )
]

# Generate content with the configured settings
response = client.models.generate_content(
    model='gemini-2.5-computer-use-preview-10-2025',
    contents=contents,
    config=generate_content_config,
)

# Print the response output
print(response.text)

您也可以加入自訂使用者定義函式，擴充模型的函式。請參閱「針對行動裝置用途使用電腦用途模型和工具」，瞭解如何新增 open_app、long_press_at 和 go_home 等動作，同時排除瀏覽器專屬動作，針對行動裝置用途設定電腦用途。

接收回覆

如果模型判斷完成工作需要 UI 動作或使用者定義函式，就會傳回一或多個 FunctionCalls。應用程式程式碼必須剖析這些動作、執行動作，並收集結果。電腦使用模型和工具支援平行函式呼叫，也就是說，模型可以在單一回合中傳回多個獨立動作。

{
  "content": {
    "parts": [
      {
        "text": "I will type the search query into the search bar. The search bar is in the center of the page."
      },
      {
        "function_call": {
          "name": "type_text_at",
          "args": {
            "x": 371,
            "y": 470,
            "text": "highly rated smart fridges with touchscreen, 2 doors, around 25 cu ft, priced below 4000 dollars on Google Shopping",
            "press_enter": true
          }
        }
      }
    ]
  }
}

視動作而定，API 回應也可能會傳回 safety_response：

{
  "content": {
    "parts": [
      {
        "text": "I have evaluated step 2. It seems Google detected unusual traffic and is asking me to verify I'm not a robot. I need to click the 'I'm not a robot' checkbox located near the top left (y=98, x=95)."
      },
      {
        "function_call": {
          "name": "click_at",
          "args": {
            "x": 60,
            "y": 100,
            "safety_decision": {
              "explanation": "I have encountered a CAPTCHA challenge that requires interaction. I need you to complete the challenge by clicking the 'I'm not a robot' checkbox and any subsequent verification steps.",
              "decision": "require_confirmation"
            }
          }
        }
      }
    ]
  }
}

執行收到的動作

收到回覆後，模型需要執行收到的動作。

下列程式碼會從 Gemini 回應中擷取函式呼叫、將座標從 0 到 1000 的範圍轉換為實際像素、使用 Playwright 執行瀏覽器動作，並傳回每個動作的成功或失敗狀態：

import time
from typing import Any, List, Tuple


def normalize_x(x: int, screen_width: int) -> int:
    """Convert normalized x coordinate (0-1000) to actual pixel coordinate."""
    return int(x / 1000 * screen_width)


def normalize_y(y: int, screen_height: int) -> int:
    """Convert normalized y coordinate (0-1000) to actual pixel coordinate."""
    return int(y / 1000 * screen_height)


def execute_function_calls(response, page, screen_width: int, screen_height: int) -> List[Tuple[str, Any]]:
    """
    Extract and execute function calls from Gemini response.

    Args:
        response: Gemini API response object
        page: Playwright page object
        screen_width: Screen width in pixels
        screen_height: Screen height in pixels

    Returns:
        List of tuples: [(function_name, result), ...]
    """
    # Extract function calls and thoughts from the model's response
    candidate = response.candidates[0]
    function_calls = []
    thoughts = []

    for part in candidate.content.parts:
        if hasattr(part, 'function_call') and part.function_call:
            function_calls.append(part.function_call)
        elif hasattr(part, 'text') and part.text:
            thoughts.append(part.text)

    if thoughts:
        print(f"Model Reasoning: {' '.join(thoughts)}")

    # Execute each function call
    results = []
    for function_call in function_calls:
        result = None

        try:
            if function_call.name == "open_web_browser":
                print("Executing open_web_browser")
                # Browser is already open via Playwright, so this is a no-op
                result = "success"

            elif function_call.name == "click_at":
                actual_x = normalize_x(function_call.args["x"], screen_width)
                actual_y = normalize_y(function_call.args["y"], screen_height)

                print(f"Executing click_at: ({actual_x}, {actual_y})")
                page.mouse.click(actual_x, actual_y)
                result = "success"

            elif function_call.name == "type_text_at":
                actual_x = normalize_x(function_call.args["x"], screen_width)
                actual_y = normalize_y(function_call.args["y"], screen_height)
                text = function_call.args["text"]
                press_enter = function_call.args.get("press_enter", False)
                clear_before_typing = function_call.args.get("clear_before_typing", True)

                print(f"Executing type_text_at: ({actual_x}, {actual_y}) text='{text}'")

                # Click at the specified location
                page.mouse.click(actual_x, actual_y)
                time.sleep(0.1)

                # Clear existing text if requested
                if clear_before_typing:
                    page.keyboard.press("Control+A")
                    page.keyboard.press("Backspace")

                # Type the text
                page.keyboard.type(text)

                # Press enter if requested
                if press_enter:
                    page.keyboard.press("Enter")

                result = "success"

            else:
                # For any functions not parsed above
                print(f"Unrecognized function: {function_call.name}")
                result = "unknown_function"

        except Exception as e:
            print(f"Error executing {function_call.name}: {e}")
            result = f"error: {str(e)}"

        results.append((function_call.name, result))

    return results

如果傳回的 safety_decision 為 require_confirmation，您必須先請使用者確認，再繼續執行動作。根據服務條款，您不得略過人工確認要求。

下列程式碼會在先前的程式碼中加入安全邏輯：

import termcolor


def get_safety_confirmation(safety_decision):
    """Prompt user for confirmation when safety check is triggered."""
    termcolor.cprint("Safety service requires explicit confirmation!", color="red")
    print(safety_decision["explanation"])

    decision = ""
    while decision.lower() not in ("y", "n", "ye", "yes", "no"):
        decision = input("Do you wish to proceed? [Y]es/[N]o\n")

    if decision.lower() in ("n", "no"):
        return "TERMINATE"
    return "CONTINUE"


def execute_function_calls(response, page, screen_width: int, screen_height: int):
    # ... Extract function calls from response ...

    for function_call in function_calls:
        extra_fr_fields = {}

        # Check for safety decision
        if 'safety_decision' in function_call.args:
            decision = get_safety_confirmation(function_call.args['safety_decision'])
            if decision == "TERMINATE":
                print("Terminating agent loop")
                break
            extra_fr_fields["safety_acknowledgement"] = "true"

        # ... Execute function call and append to results ...

擷取新狀態

執行動作後，將函式執行結果傳回模型，模型就能使用這項資訊生成下一個動作。如果執行多個動作 (平行呼叫)，您必須在後續使用者回合中，為每個動作傳送 FunctionResponse。如果是使用者定義函式，FunctionResponse 應包含已執行函式的傳回值。

function_response_parts = []

for name, result in results:
    # Take screenshot after each action
    screenshot = page.screenshot()
    current_url = page.url

    function_response_parts.append(
        FunctionResponse(
            name=name,
            response={"url": current_url},  # Include safety acknowledgement
            parts=[
                types.FunctionResponsePart(
                    inline_data=types.FunctionResponseBlob(
                       mime_type="image/png", data=screenshot
                    )
                )
            ]
        )
    )

# Create the user feedback content with all responses
user_feedback_content = Content(
    role="user",
    parts=function_response_parts
)

# Append this feedback to the 'contents' history list for the next API call
contents.append(user_feedback_content)

建構代理程式迴圈

將上述步驟合併為迴圈，即可啟用多步驟互動。迴圈必須處理平行函式呼叫。請記得正確管理對話記錄 (內容陣列)，方法是同時附加模型回覆和函式回覆。

Python

from google import genai
from google.genai.types import Content, Part
from playwright.sync_api import sync_playwright


def has_function_calls(response):
    """Check if response contains any function calls."""
    candidate = response.candidates[0]
    return any(hasattr(part, 'function_call') and part.function_call
               for part in candidate.content.parts)


def main():
    client = genai.Client()

    # ... (config setup from "Send a request to model" section) ...

    with sync_playwright() as p:
        browser = p.chromium.launch(headless=False)
        page = browser.new_page()
        page.goto("https://www.google.com")
        
        screen_width, screen_height = 1920, 1080
        
        # ... (initial contents setup from "Send a request to model" section) ...

        # Agent loop: iterate until model provides final answer
        for iteration in range(10):
            print(f"\nIteration {iteration + 1}\n")

            # 1. Send request to model (see "Send a request to model" section)
            response = client.models.generate_content(
                model='gemini-2.5-computer-use-preview-10-2025',
                contents=contents,
                config=generate_content_config,
            )

            contents.append(response.candidates[0].content)

            # 2. Check if done - no function calls means final answer
            if not has_function_calls(response):
                print(f"FINAL RESPONSE:\n{response.text}")
                break

            # 3. Execute actions (see "Execute the received actions" section)
            results = execute_function_calls(response, page, screen_width, screen_height)
            time.sleep(1)

            # 4. Capture state and create feedback (see "Capture the New State" section)
            contents.append(create_feedback(results, page))

        input("\nPress Enter to close browser...")
        browser.close()


if __name__ == "__main__":
    main()

電腦用途模型和行動裝置用途工具

以下範例說明如何定義自訂函式 (例如 open_app、long_press_at 和 go_home)、將這些函式與 Gemini 內建的電腦使用工具合併，以及排除不必要的瀏覽器專屬函式。註冊這些自訂函式後，模型就能智慧地呼叫這些函式，並搭配標準 UI 動作，在非瀏覽器環境中完成工作。

from typing import Optional, Dict, Any

from google import genai
from google.genai import types
from google.genai.types import Content, Part


client = genai.Client()

def open_app(app_name: str, intent: Optional[str] = None) -> Dict[str, Any]:
    """Opens an app by name.

    Args:
        app_name: Name of the app to open (any string).
        intent: Optional deep-link or action to pass when launching, if the app supports it.

    Returns:
        JSON payload acknowledging the request (app name and optional intent).
    """
    return {"status": "requested_open", "app_name": app_name, "intent": intent}


def long_press_at(x: int, y: int, duration_ms: int = 500) -> Dict[str, int]:
    """Long-press at a specific screen coordinate.

    Args:
        x: X coordinate (absolute), scaled to the device screen width (pixels).
        y: Y coordinate (absolute), scaled to the device screen height (pixels).
        duration_ms: Press duration in milliseconds. Defaults to 500.

    Returns:
        Object with the coordinates pressed and the duration used.
    """
    return {"x": x, "y": y, "duration_ms": duration_ms}


def go_home() -> Dict[str, str]:
    """Navigates to the device home screen.

    Returns:
        A small acknowledgment payload.
    """
    return {"status": "home_requested"}


#  Build function declarations
CUSTOM_FUNCTION_DECLARATIONS = [
    types.FunctionDeclaration.from_callable(client=client, callable=open_app),
    types.FunctionDeclaration.from_callable(client=client, callable=long_press_at),
    types.FunctionDeclaration.from_callable(client=client, callable=go_home),
]

# Exclude browser functions

EXCLUDED_PREDEFINED_FUNCTIONS = [
    "open_web_browser",
    "search",
    "navigate",
    "hover_at",
    "scroll_document",
    "go_forward",
    "key_combination",
    "drag_and_drop",
]

# Utility function to construct a GenerateContentConfig

def make_generate_content_config() -> genai.types.GenerateContentConfig:
    """Return a fixed GenerateContentConfig with Computer Use + custom functions."""
    return genai.types.GenerateContentConfig(
        tools=[
            types.Tool(
                computer_use=types.ComputerUse(
                    environment=types.Environment.ENVIRONMENT_BROWSER,
                    excluded_predefined_functions=EXCLUDED_PREDEFINED_FUNCTIONS,
                )
            ),
            types.Tool(function_declarations=CUSTOM_FUNCTION_DECLARATIONS),
        ]
    )


# Create the content with user message
contents: list[Content] = [
    Content(
        role="user",
        parts=[
            # text instruction
            Part(text="Open Chrome, then long-press at 200,400."),
            # optional screenshot attachment
            Part.from_bytes(
                data=screenshot_image_bytes,
                mime_type="image/png",
            ),
        ],
    )
]

# Build your fixed config (from helper)
config = make_generate_content_config()

# Generate content with the configured settings
response = client.models.generate_content(
        model="gemini-2.5-computer-use-preview-10-2025",
        contents=contents,
        config=generate_content_config,
    )

    print(response)

支援的動作

電腦使用模型和工具可讓模型使用 FunctionCall 要求執行下列動作。用戶端程式碼必須實作這些動作的執行邏輯。如需範例，請參閱參考實作方式。

指令名稱	說明	引數 (在函式呼叫中)	函式呼叫範例
open_web_browser	開啟網路瀏覽器。	無	`{"name": "open_web_browser", "args": {}}`
wait_5_seconds	暫停執行 5 秒，讓動態內容載入或動畫完成。	無	`{"name": "wait_5_seconds", "args": {}}`
go_back	前往瀏覽器記錄中的上一頁。	無	`{"name": "go_back", "args": {}}`
go_forward	前往瀏覽器記錄中的下一頁。	無	`{"name": "go_forward", "args": {}}`
搜尋	前往預設搜尋引擎的首頁 (例如 Google)。適合用來開始新的搜尋工作。	無	`{"name": "search", "args": {}}`
navigate	直接將瀏覽器導向指定網址。	`url`：str	`{"name": "navigate", "args": {"url": "https://www.wikipedia.org"}}`
click_at	點選網頁上的特定座標。x 和 y 值是以 1000x1000 格線為準，並會縮放至螢幕尺寸。	`y`：int (0 到 999)，`x`：int (0 到 999)	`{"name": "click_at", "args": {"y": 300, "x": 500}}`
hover_at	將滑鼠懸停在網頁上的特定座標。可用於顯示子選單。x 和 y 是以 1000x1000 格線為準。	`y`: int (0-999) `x`: int (0-999)	`{"name": "hover_at", "args": {"y": 150, "x": 250}}`
type_text_at	在特定座標輸入文字，預設會先清除欄位，然後在輸入完畢後按下 ENTER 鍵，但這些動作可以停用。x 和 y 座標是以 1000x1000 的格線為準。	`y`：int (0-999)、`x`：int (0-999)、`text`：str、`press_enter`：bool (選用，預設為 True)、`clear_before_typing`：bool (選用，預設為 True)	`{"name": "type_text_at", "args": {"y": 250, "x": 400, "text": "search query", "press_enter": false}}`
key_combination	按下鍵盤按鍵或組合鍵，例如「Ctrl+C」或「Enter」。可用於觸發動作 (例如使用「Enter」鍵提交表單) 或剪貼簿作業。	`keys`：str (例如「enter」、「control+c」。如需允許使用的金鑰完整清單，請參閱 API 參考資料)	`{"name": "key_combination", "args": {"keys": "Control+A"}}`
scroll_document	將整個網頁「向上」、「向下」、「向左」或「向右」捲動。	`direction`：字串 (「up」、「down」、「left」或「right」)	`{"name": "scroll_document", "args": {"direction": "down"}}`
scroll_at	在指定方向上，將特定元素或區域捲動特定幅度，座標為 (x, y)。座標和震級 (預設為 800) 是以 1000x1000 格線為準。	`y`：int (0-999)、`x`：int (0-999)、`direction`：str ("up"、"down"、"left"、"right")、`magnitude`：int (0-999，選用，預設為 800)	`{"name": "scroll_at", "args": {"y": 500, "x": 500, "direction": "down", "magnitude": 400}}`
drag_and_drop	從起始座標 (x, y) 拖曳元素，並在目的地座標 (destination_x, destination_y) 放開。所有座標都是以 1000x1000 的格線為準。	`y`：int (0-999)、`x`：int (0-999)、`destination_y`：int (0-999)、`destination_x`：int (0-999)	`{"name": "drag_and_drop", "args": {"y": 100, "x": 100, "destination_y": 500, "destination_x": 500}}`

安全與安全性

本節說明「電腦使用」模型和工具採取的防護措施，可提升使用者控制權和安全性。此外，本文也說明如何運用最佳做法，降低這項工具可能帶來的潛在新風險。

確認安全決定

視動作而定，電腦使用模型和工具的回應可能包含來自內部安全系統的 safety_decision。這項決定會驗證工具建議的安全措施。

{
  "content": {
    "parts": [
      {
        "text": "I have evaluated step 2. It seems Google detected unusual traffic and is asking me to verify I'm not a robot. I need to click the 'I'm not a robot' checkbox located near the top left (y=98, x=95)."
      },
      {
        "function_call": {
          "name": "click_at",
          "args": {
            "x": 60,
            "y": 100,
            "safety_decision": {
              "explanation": "I have encountered a CAPTCHA challenge that requires interaction. I need you to complete the challenge by clicking the 'I'm not a robot' checkbox and any subsequent verification steps.",
              "decision": "require_confirmation"
            }
          }
        }
      }
    ]
  }
}

如果 safety_decision 為 require_confirmation，您「必須」先請使用者確認，再繼續執行動作。

下列程式碼範例會在執行動作前，提示使用者確認。如果使用者未確認動作，迴圈就會終止。如果使用者確認動作，系統就會執行動作，並將 safety_acknowledgement 欄位標示為 True。

import termcolor

def get_safety_confirmation(safety_decision):
    """Prompt user for confirmation when safety check is triggered."""
    termcolor.cprint("Safety service requires explicit confirmation!", color="red")
    print(safety_decision["explanation"])

    decision = ""
    while decision.lower() not in ("y", "n", "ye", "yes", "no"):
        decision = input("Do you wish to proceed? [Y]es/[N]o\n")

    if decision.lower() in ("n", "no"):
        return "TERMINATE"
    return "CONTINUE"

def execute_function_calls(response, page, screen_width: int, screen_height: int):

    # ... Extract function calls from response ...

    for function_call in function_calls:
        extra_fr_fields = {}

        # Check for safety decision
        if 'safety_decision' in function_call.args:
            decision = get_safety_confirmation(function_call.args['safety_decision'])
            if decision == "TERMINATE":
                print("Terminating agent loop")
                break
            extra_fr_fields["safety_acknowledgement"] = "true" # Safety acknowledgement

        # ... Execute function call and append to results ...

如果使用者確認，您必須在 FunctionResponse 中加入安全確認聲明。

function_response_parts.append(
    FunctionResponse(
        name=name,
        response={"url": current_url,
                  **extra_fr_fields},  # Include safety acknowledgement
        parts=[
            types.FunctionResponsePart(
                inline_data=types.FunctionResponseBlob(
                    mime_type="image/png", data=screenshot
                )
             )
           ]
         )
       )

安全性最佳做法

電腦使用模型和工具是新穎的工具，會帶來開發人員應留意的全新風險：

不可信的內容和詐騙：模型會盡力達成使用者的目標，但可能會依據不可信的資訊來源和畫面上的指示執行操作。舉例來說，如果使用者的目標是購買 Pixel 手機，而模型遇到「完成問卷調查即可免費獲得 Pixel」的詐騙訊息，模型可能會完成問卷調查。
偶爾會發生非預期動作：模型可能會誤解使用者的目標或網頁內容，導致採取錯誤動作，例如點選錯誤的按鈕或填寫錯誤的表單。這可能會導致工作失敗或資料外洩。
違反政策：無論是蓄意或無意，API 的功能都可能用於違反 Google 政策 (《生成式 AI 使用限制政策》和《Gemini API 附加服務條款》) 的活動。包括可能干擾系統完整性、危害安全性、略過 CAPTCHA 等安全措施、控制醫療器材等行為。

為因應這些風險，您可以採取下列安全措施和最佳做法：

人機迴圈 (HITL)：

實作使用者確認：如果安全回應指出 require_confirmation，您必須先實作使用者確認，才能執行作業。
提供自訂安全指示：除了內建的使用者確認檢查外，開發人員也可以選擇新增自訂系統指示，強制執行自己的安全政策，禁止模型執行特定動作，或要求使用者確認後，模型才能執行特定高風險的不可逆動作。以下是與模型互動時可加入的自訂安全系統指令範例。

按一下即可查看建立連結的範例

## **RULE 1: Seek User Confirmation (USER_CONFIRMATION)**

This is your first and most important check. If the next required action falls
into any of the following categories, you MUST stop immediately, and seek the
user's explicit permission.

**Procedure for Seeking Confirmation:**  * **For Consequential Actions:**
Perform all preparatory steps (e.g., navigating, filling out forms, typing a
message). You will ask for confirmation **AFTER** all necessary information is
entered on the screen, but **BEFORE** you perform the final, irreversible action
(e.g., before clicking "Send", "Submit", "Confirm Purchase", "Share").  * **For
Prohibited Actions:** If the action is strictly forbidden (e.g., accepting legal
terms, solving a CAPTCHA), you must first inform the user about the required
action and ask for their confirmation to proceed.

**USER_CONFIRMATION Categories:**

*   **Consent and Agreements:** You are FORBIDDEN from accepting, selecting, or
    agreeing to any of the following on the user's behalf. You must ask th e
    user to confirm before performing these actions.
    *   Terms of Service
    *   Privacy Policies
    *   Cookie consent banners
    *   End User License Agreements (EULAs)
    *   Any other legally significant contracts or agreements.
*   **Robot Detection:** You MUST NEVER attempt to solve or bypass the
    following. You must ask the user to confirm before performing these actions.
*   CAPTCHAs (of any kind)
    *   Any other anti-robot or human-verification mechanisms, even if you are
        capable.
*   **Financial Transactions:**
    *   Completing any purchase.
    *   Managing or moving money (e.g., transfers, payments).
    *   Purchasing regulated goods or participating in gambling.
*   **Sending Communications:**
    *   Sending emails.
    *   Sending messages on any platform (e.g., social media, chat apps).
    *   Posting content on social media or forums.
*   **Accessing or Modifying Sensitive Information:**
    *   Health, financial, or government records (e.g., medical history, tax
        forms, passport status).
    *   Revealing or modifying sensitive personal identifiers (e.g., SSN, bank
        account number, credit card number).
*   **User Data Management:**
    *   Accessing, downloading, or saving files from the web.
    *   Sharing or sending files/data to any third party.
    *   Transferring user data between systems.
*   **Browser Data Usage:**
    *   Accessing or managing Chrome browsing history, bookmarks, autofill data,
        or saved passwords.
*   **Security and Identity:**
    *   Logging into any user account.
    *   Any action that involves misrepresentation or impersonation (e.g.,
        creating a fan account, posting as someone else).
*   **Insurmountable Obstacles:** If you are technically unable to interact with
    a user interface element or are stuck in a loop you cannot resolve, ask the
    user to take over.
---

## **RULE 2: Default Behavior (ACTUATE)**

If an action does **NOT** fall under the conditions for `USER_CONFIRMATION`,
your default behavior is to **Actuate**.

**Actuation Means:**  You MUST proactively perform all necessary steps to move
the user's request forward. Continue to actuate until you either complete the
non-consequential task or encounter a condition defined in Rule 1.

*   **Example 1:** If asked to send money, you will navigate to the payment
    portal, enter the recipient's details, and enter the amount. You will then
    **STOP** as per Rule 1 and ask for confirmation before clicking the final
    "Send" button.
*   **Example 2:** If asked to post a message, you will navigate to the site,
    open the post composition window, and write the full message. You will then
    **STOP** as per Rule 1 and ask for confirmation before clicking the final
    "Post" button.

    After the user has confirmed, remember to get the user's latest screen
    before continuing to perform actions.

# Final Response Guidelines:
Write final response to the user in these cases:
- User confirmation
- When the task is complete or you have enough information to respond to the user

安全執行環境：在採用沙箱機制的安全環境中執行代理程式，以限制其潛在影響 (例如採用沙箱機制的虛擬機器 (VM)、容器 (例如 Docker) 或權限受限的專用瀏覽器設定檔)。
輸入內容清除：清除提示中所有使用者產生的文字，降低意外指令或提示注入的風險。這層安全防護措施很有幫助，但無法取代安全執行環境。
允許清單和封鎖清單：導入篩選機制，控管模型可前往的位置和可執行的動作。禁止存取的網站封鎖清單是不錯的起點，而限制更嚴格的允許清單則更加安全。
可觀測性和記錄：維護詳細記錄，以利偵錯、稽核和事件回應。客戶應記錄提示、螢幕截圖、模型建議的動作 (function_call)、安全防護回應，以及客戶最終執行的所有動作。

定價

電腦使用模型和工具的定價與 Gemini 2.5 Pro 相同，並使用相同的 SKU。如要拆分電腦使用模型和工具費用，請使用自訂中繼資料標籤。如要進一步瞭解如何使用自訂中繼資料標籤監控費用，請參閱「自訂中繼資料標籤」一文。

電腦使用模型和工具 透過集合功能整理內容 你可以依據偏好儲存及分類內容。

電腦使用模型和工具的運作方式

啟用「電腦使用」模型和工具

Python

傳送要求

Python

接收回覆

執行收到的動作

擷取新狀態

建構代理程式迴圈

Python

電腦用途模型和行動裝置用途工具

支援的動作

安全與安全性

確認安全決定

安全性最佳做法

按一下即可查看建立連結的範例

定價

電腦使用模型和工具