您可以使用 Gen AI Evaluation Service,評估代理完成特定用途的任務和目標的能力。
本頁面說明如何建立及部署基本代理程式,並使用 Gen AI Evaluation Service 評估代理程式:
開發代理程式:定義具有基本工具函式的代理程式。
部署代理:將代理部署至 Vertex AI Agent Engine 執行階段。
執行代理推論:定義評估資料集並執行代理推論,以生成回覆。
建立評估執行作業:建立評估執行作業來執行評估。
查看評估結果:透過評估作業查看評估結果。
事前準備
-
Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Roles required to select or create a project
- Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
-
Create a project: To create a project, you need the Project Creator role
(
roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.createpermission. Learn how to grant roles.
Verify that billing is enabled for your Google Cloud project.
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Roles required to select or create a project
- Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
-
Create a project: To create a project, you need the Project Creator role
(
roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.createpermission. Learn how to grant roles.
Verify that billing is enabled for your Google Cloud project.
安裝 Vertex AI SDK for Python:
%pip install google-cloud-aiplatform[adk,agent_engines] %pip install --upgrade --force-reinstall -q google-cloud-aiplatform[evaluation]設定憑證。如果您是在 Colaboratory 中執行本教學課程,請執行下列指令:
from google.colab import auth auth.authenticate_user()如為其他環境,請參閱「向 Vertex AI 進行驗證」。
在 Vertex AI SDK 中初始化 GenAI 用戶端:
import vertexai from vertexai import Client from google.genai import types as genai_types GCS_DEST = "gs://BUCKET_NAME/output-path" vertexai.init( project=PROJECT_ID, location=LOCATION, ) client = Client( project=PROJECT_ID, location=LOCATION, http_options=genai_types.HttpOptions(api_version="v1beta1"), )更改下列內容:
BUCKET_NAME:Cloud Storage bucket 名稱。如要進一步瞭解如何建立 bucket,請參閱「建立 bucket」一節。
PROJECT_ID:您的專案 ID。
LOCATION:您選取的區域。
使用
run_inference()為資料集生成模型回應:準備資料集,並以 Pandas DataFrame 形式呈現。提示應與代理程式相關。追蹤記錄必須提供工作階段輸入內容。詳情請參閱「工作階段:追蹤個別對話」。
import pandas as pd from vertexai import types session_inputs = types.evals.SessionInput( user_id="user_123", state={}, ) agent_prompts = [ "Search for 'noise-cancelling headphones'.", "Show me the details for product 'B08H8H8H8H'.", "Add one pair of 'B08H8H8H8H' to my shopping cart.", "Find 'wireless earbuds' and then add the first result to my cart.", "I need a new laptop for work, can you find one with at least 16GB of RAM?", ] agent_dataset = pd.DataFrame({ "prompt": agent_prompts, "session_inputs": [session_inputs] * len(agent_prompts), })使用
run_inference()生成模型回覆:agent_dataset_with_inference = client.evals.run_inference( agent=agent_engine_resource_name, src=agent_dataset, )呼叫
EvaluationDataset物件上的.show(),即可將推論結果視覺化,並檢查模型輸出內容、原始提示和參照:agent_dataset_with_inference.show()下圖顯示評估資料集,以及提示和對應產生的
intermediate_events和responses:
使用內建輔助函式擷取
agent_info:agent_info = types.evals.AgentInfo.load_from_agent( my_agent, agent_engine_resource_name )使用代理專屬的調整型評分量表指標 (
FINAL_RESPONSE_QUALITY、TOOL_USE_QUALITY和HALLUCINATION) 評估模型回覆:evaluation_run = client.evals.create_evaluation_run( dataset=agent_dataset_with_inference, agent_info=agent_info, metrics=[ types.RubricMetric.FINAL_RESPONSE_QUALITY, types.RubricMetric.TOOL_USE_QUALITY, types.RubricMetric.HALLUCINATION, types.RubricMetric.SAFETY, ], dest=GCS_DEST, )- 開發代理。
- 部署代理程式。
- 使用代理程式。
- 進一步瞭解生成式 AI 評估服務
開發代理
定義模型、指令和工具集,開發 Agent Development Kit (ADK) 代理。如要進一步瞭解如何開發代理程式,請參閱「開發 Agent Development Kit 代理程式」。
from google.adk import Agent
# Define Agent Tools
def search_products(query: str):
"""Searches for products based on a query."""
# Mock response for demonstration
if "headphones" in query.lower():
return {"products": [{"name": "Wireless Headphones", "id": "B08H8H8H8H"}]}
else:
return {"products": []}
def get_product_details(product_id: str):
"""Gets the details for a given product ID."""
if product_id == "B08H8H8H8H":
return {"details": "Noise-cancelling, 20-hour battery life."}
else:
return {"error": "Product not found."}
def add_to_cart(product_id: str, quantity: int):
"""Adds a specified quantity of a product to the cart."""
return {"status": f"Added {quantity} of {product_id} to cart."}
# Define Agent
my_agent = Agent(
model="gemini-2.5-flash",
name='ecommerce_agent',
instruction='You are an ecommerce expert',
tools=[search_products, get_product_details, add_to_cart],
)
部署代理程式
將代理部署至 Vertex AI Agent Engine 執行階段。這項作業最多可能需要 10 分鐘。從已部署的代理程式擷取資源名稱。
def deploy_adk_agent(root_agent):
"""Deploy agent to agent engine.
Args:
root_agent: The ADK agent to deploy.
"""
app = vertexai.agent_engines.AdkApp(
agent=root_agent,
)
remote_app = client.agent_engines.create(
agent=app,
config = {
"staging_bucket": gs://BUCKET_NAME,
"requirements": ['google-cloud-aiplatform[adk,agent_engines]'],
"env_vars": {"GOOGLE_CLOUD_AGENT_ENGINE_ENABLE_TELEMETRY": "true"}
}
)
return remote_app
agent_engine = deploy_adk_agent(my_agent)
agent_engine_resource_name = agent_engine.api_resource.name
如要查看部署至 Vertex AI Agent Engine 的代理程式清單,請參閱「管理已部署的代理程式」。
生成回覆
執行代理評估作業
執行 create_evaluation_run() 來評估代理程式的回覆。
查看代理程式評估結果
您可以使用 Vertex AI SDK 查看評估結果。
呼叫 .show() 擷取評估執行作業,並以視覺化方式呈現評估結果,顯示摘要指標和詳細結果:
evaluation_run = client.evals.get_evaluation_run(
name=evaluation_run.name,
include_evaluation_items=True
)
evaluation_run.show()
下圖顯示評估報表,其中包含摘要指標、專員資訊,以及每個提示/回覆組合的詳細結果。詳細結果也包含顯示代理程式互動的追蹤記錄。如要進一步瞭解追蹤記錄,請參閱「追蹤代理程式」。

後續步驟
請試用下列筆記本: