使用 Agent Platform SDK 中的 GenAI Client 評估生成式 AI 代理

建構及評估生成式 AI 模型後,您可能會使用該模型建構代理程式,例如聊天機器人。透過 Gen AI 評估,您可以衡量代理程式完成工作和達成目標的能力,以符合您的用途。

本頁面說明如何建立及部署基本代理程式,並使用 Gen AI 評估功能評估代理程式:

事前準備

  1. 登入 Google Cloud 帳戶。如果您是 Google Cloud新手,歡迎 建立帳戶,親自評估產品在實際工作環境中的成效。新客戶還能獲得價值 $300 美元的免費抵免額,可用於執行、測試及部署工作負載。

    In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Roles required to select or create a project

    • Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
    • Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

    Go to project selector

    Verify that billing is enabled for your Google Cloud project.

    In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Roles required to select or create a project

    • Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
    • Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

    Go to project selector

    Verify that billing is enabled for your Google Cloud project.

  2. 安裝 Agent Platform SDK for Python:

    %pip install google-cloud-aiplatform[adk,agent_engines]
    %pip install --upgrade --force-reinstall -q google-cloud-aiplatform[evaluation]
    
  3. 設定憑證。如果您是在 Colaboratory 中執行本教學課程,請執行下列指令:

    from google.colab import auth
    auth.authenticate_user()
    

    如為其他環境,請參閱「向 Agent Platform 進行驗證」。

  4. 在 Agent Platform SDK 中初始化 GenAI 用戶端:

    import vertexai
    from vertexai import Client
    from google.genai import types as genai_types
    
    GCS_DEST = "gs://BUCKET_NAME/output-path"
    vertexai.init(
        project=PROJECT_ID,
        location=LOCATION,
    )
    
    client = Client(
        project=PROJECT_ID,
        location=LOCATION,
        http_options=genai_types.HttpOptions(api_version="v1beta1"),
      )
    

    更改下列內容:

    • BUCKET_NAME:Cloud Storage bucket 名稱。如要進一步瞭解如何建立 bucket,請參閱「建立 bucket」。

    • PROJECT_ID:專案 ID。

    • LOCATION:您選取的區域。

開發代理

定義模型、指令和工具集,開發 Agent Development Kit (ADK) 代理。如要進一步瞭解如何開發代理程式,請參閱「開發 Agent Development Kit 代理程式」。

from google.adk import Agent

# Define Agent Tools
def search_products(query: str):
    """Searches for products based on a query."""
    # Mock response for demonstration
    if "headphones" in query.lower():
        return {"products": [{"name": "Wireless Headphones", "id": "B08H8H8H8H"}]}
    else:
        return {"products": []}

def get_product_details(product_id: str):
    """Gets the details for a given product ID."""
    if product_id == "B08H8H8H8H":
        return {"details": "Noise-cancelling, 20-hour battery life."}
    else:
        return {"error": "Product not found."}

def add_to_cart(product_id: str, quantity: int):
    """Adds a specified quantity of a product to the cart."""
    return {"status": f"Added {quantity} of {product_id} to cart."}

# Define Agent
my_agent = Agent(
    model="gemini-2.5-flash",
    name='ecommerce_agent',
    instruction='You are an ecommerce expert',
    tools=[search_products, get_product_details, add_to_cart],
)

部署代理

將代理部署至 Agent Platform Runtime。這項作業最多可能需要 10 分鐘的時間。從已部署的代理程式擷取資源名稱。

def deploy_adk_agent(root_agent):
  """Deploy agent to agent engine.
  Args:
    root_agent: The ADK agent to deploy.
  """
  app = vertexai.agent_engines.AdkApp(
      agent=root_agent,
  )
  remote_app = client.agent_engines.create(
      agent=app,
      config = {
          "staging_bucket": gs://BUCKET_NAME,
          "requirements": ['google-cloud-aiplatform[adk,agent_engines]'],
          "env_vars": {"GOOGLE_CLOUD_AGENT_ENGINE_ENABLE_TELEMETRY": "true"}
      }
  )
  return remote_app

agent_engine = deploy_adk_agent(my_agent)
agent_engine_resource_name = agent_engine.api_resource.name

如要取得部署至 Agent Platform 的代理程式清單,請參閱「管理已部署的代理程式」。

生成回覆

  1. 使用 run_inference() 為資料集生成模型回應:

    以 Pandas DataFrame 形式準備資料集。提示應與代理程式相關。追蹤記錄需要工作階段輸入內容。詳情請參閱「工作階段:追蹤個別對話」。

    import pandas as pd
    from vertexai import types
    
    session_inputs = types.evals.SessionInput(
        user_id="user_123",
        state={},
    )
    agent_prompts = [
        "Search for 'noise-cancelling headphones'.",
        "Show me the details for product 'B08H8H8H8H'.",
        "Add one pair of 'B08H8H8H8H' to my shopping cart.",
        "Find 'wireless earbuds' and then add the first result to my cart.",
        "I need a new laptop for work, can you find one with at least 16GB of RAM?",
    ]
    agent_dataset = pd.DataFrame({
        "prompt": agent_prompts,
        "session_inputs": [session_inputs] * len(agent_prompts),
    })
    
  2. 使用 run_inference() 生成模型回覆:

    agent_dataset_with_inference = client.evals.run_inference(
        agent=agent_engine_resource_name,
        src=agent_dataset,
    )
    
  3. 將推論結果視覺化EvaluationDataset 物件上呼叫 .show(),即可檢查模型輸出內容,以及原始提示和參照:

    agent_dataset_with_inference.show()
    

執行代理評估

執行 create_evaluation_run() 來評估代理的回覆。

  1. 使用內建輔助函式擷取 agent_info

    agent_info = types.evals.AgentInfo.load_from_agent(
        my_agent,
        agent_engine_resource_name
    )
    
  2. 使用代理專屬的調整型評分量表指標 (FINAL_RESPONSE_QUALITYTOOL_USE_QUALITYHALLUCINATION) 評估模型回覆:

    evaluation_run = client.evals.create_evaluation_run(
        dataset=agent_dataset_with_inference,
        agent_info=agent_info,
        metrics=[
            types.RubricMetric.FINAL_RESPONSE_QUALITY,
            types.RubricMetric.TOOL_USE_QUALITY,
            types.RubricMetric.HALLUCINATION,
            types.RubricMetric.SAFETY,
        ],
        dest=GCS_DEST,
    )
    

查看代理程式評估結果

您可以使用 Agent Platform SDK 查看評估結果。

擷取評估執行作業,並呼叫 .show() 顯示摘要指標和詳細結果,以視覺化方式呈現評估結果

evaluation_run = client.evals.get_evaluation_run(
    name=evaluation_run.name,
    include_evaluation_items=True
)

evaluation_run.show()

詳細結果也會顯示代理程式互動的追蹤記錄。如要進一步瞭解追蹤記錄,請參閱「追蹤代理程式」。

後續步驟

請試用下列代理評估筆記本: