Google uses AI technology to translate content into your preferred language. AI translations can contain errors.

Agent Platform SDK の生成 AI クライアントを使用して生成 AI エージェントを評価する

生成 AI モデルを構築して評価したら、そのモデルを使用して chatbot などのエージェントを構築できます。Gen AI 評価を使用すると、ユースケースのタスクや目標を完了するエージェントの能力を測定できます。

このページでは、基本的なエージェントを作成してデプロイし、Gen AI 評価を使用してエージェントを評価する方法について説明します。

エージェントを開発する: 基本的なツール関数を使用してエージェントを定義します。
エージェントをデプロイする: エージェントを Agent Platform Runtime にデプロイします。
エージェントの推論を実行する: 評価データセットを定義し、エージェントの推論を実行してレスポンスを生成します。
評価実行を作成する: 評価を実行する評価実行を作成します。
評価結果を表示する: 評価実行を通じて評価結果を表示します。

始める前に

Google Cloud アカウントにログインします。 Google Cloudを初めて使用する場合は、アカウントを作成して、実際のシナリオでの Google プロダクトのパフォーマンスを評価してください。新規のお客様には、ワークロードの実行、テスト、デプロイができる無料クレジット $300 分を差し上げます。
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Roles required to select or create a project
- Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
- Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.
Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.

Go to project selector

Verify that billing is enabled for your Google Cloud project.
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Roles required to select or create a project
- Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
- Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.
Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

Agent Platform SDK for Python をインストールします。

%pip install google-cloud-aiplatform[adk,agent_engines]
%pip install --upgrade --force-reinstall -q google-cloud-aiplatform[evaluation]

認証情報を設定します。このチュートリアルを Colaboratory で実行している場合は、次のコマンドを実行します。
```
from google.colab import auth
auth.authenticate_user()
```
他の環境については、Agent Platform に対する認証をご覧ください。

Agent Platform SDK で生成 AI クライアントを初期化します。

import vertexai
from vertexai import Client
from google.genai import types as genai_types

GCS_DEST = "gs://BUCKET_NAME/output-path"
vertexai.init(
    project=PROJECT_ID,
    location=LOCATION,
)

client = Client(
    project=PROJECT_ID,
    location=LOCATION,
    http_options=genai_types.HttpOptions(api_version="v1beta1"),
  )

次のように置き換えます。

BUCKET_NAME: Cloud Storage バケットの名前。バケットの作成の詳細については、バケットを作成するをご覧ください。
PROJECT_ID: プロジェクト ID。
LOCATION: 選択したリージョン。

エージェントを開発する

モデル、指示、ツールのセットを定義して、Agent Development Kit（ADK）エージェントを開発します。エージェントの開発について詳しくは、Agent Development Kit エージェントを開発するをご覧ください。

from google.adk import Agent

# Define Agent Tools
def search_products(query: str):
    """Searches for products based on a query."""
    # Mock response for demonstration
    if "headphones" in query.lower():
        return {"products": [{"name": "Wireless Headphones", "id": "B08H8H8H8H"}]}
    else:
        return {"products": []}

def get_product_details(product_id: str):
    """Gets the details for a given product ID."""
    if product_id == "B08H8H8H8H":
        return {"details": "Noise-cancelling, 20-hour battery life."}
    else:
        return {"error": "Product not found."}

def add_to_cart(product_id: str, quantity: int):
    """Adds a specified quantity of a product to the cart."""
    return {"status": f"Added {quantity} of {product_id} to cart."}

# Define Agent
my_agent = Agent(
    model="gemini-2.5-flash",
    name='ecommerce_agent',
    instruction='You are an ecommerce expert',
    tools=[search_products, get_product_details, add_to_cart],
)

エージェントをデプロイする

Agent Platform Runtime にエージェントをデプロイします。ダウンロードには最大で 10 分かかります。デプロイされたエージェントからリソース名を取得します。

def deploy_adk_agent(root_agent):
  """Deploy agent to agent engine.
  Args:
    root_agent: The ADK agent to deploy.
  """
  app = vertexai.agent_engines.AdkApp(
      agent=root_agent,
  )
  remote_app = client.agent_engines.create(
      agent=app,
      config = {
          "staging_bucket": gs://BUCKET_NAME,
          "requirements": ['google-cloud-aiplatform[adk,agent_engines]'],
          "env_vars": {"GOOGLE_CLOUD_AGENT_ENGINE_ENABLE_TELEMETRY": "true"}
      }
  )
  return remote_app

agent_engine = deploy_adk_agent(my_agent)
agent_engine_resource_name = agent_engine.api_resource.name

Agent Platform にデプロイされているエージェントのリストを取得するには、デプロイされたエージェントを管理するをご覧ください。

レスポンスを生成する

run_inference() を使用して、データセットのモデルレスポンスを生成します。

Pandas DataFrame としてデータセットを準備します。プロンプトはエージェントに固有のものである必要があります。トレースにはセッション入力が必要です。詳細については、セッション: 個々の会話のトラッキングをご覧ください。

import pandas as pd
from vertexai import types

session_inputs = types.evals.SessionInput(
    user_id="user_123",
    state={},
)
agent_prompts = [
    "Search for 'noise-cancelling headphones'.",
    "Show me the details for product 'B08H8H8H8H'.",
    "Add one pair of 'B08H8H8H8H' to my shopping cart.",
    "Find 'wireless earbuds' and then add the first result to my cart.",
    "I need a new laptop for work, can you find one with at least 16GB of RAM?",
]
agent_dataset = pd.DataFrame({
    "prompt": agent_prompts,
    "session_inputs": [session_inputs] * len(agent_prompts),
})

run_inference() を使用してモデルのレスポンスを生成します。

agent_dataset_with_inference = client.evals.run_inference(
    agent=agent_engine_resource_name,
    src=agent_dataset,
)

EvaluationDataset オブジェクトで .show() を呼び出して、元のプロンプトと参照とともにモデルの出力を検査し、推論結果を可視化します。
```
agent_dataset_with_inference.show()
```

エージェントの評価を実行する

create_evaluation_run() を実行して、エージェントのレスポンスを評価します。

組み込みのヘルパー関数を使用して agent_info を取得します。

agent_info = types.evals.AgentInfo.load_from_agent(
    my_agent,
    agent_engine_resource_name
)

エージェント固有の適応型ルーブリックベースの指標（FINAL_RESPONSE_QUALITY、TOOL_USE_QUALITY、HALLUCINATION）を使用して、モデルのレスポンスを評価します。

evaluation_run = client.evals.create_evaluation_run(
    dataset=agent_dataset_with_inference,
    agent_info=agent_info,
    metrics=[
        types.RubricMetric.FINAL_RESPONSE_QUALITY,
        types.RubricMetric.TOOL_USE_QUALITY,
        types.RubricMetric.HALLUCINATION,
        types.RubricMetric.SAFETY,
    ],
    dest=GCS_DEST,
)

エージェントの評価結果を表示する

評価結果は、Agent Platform SDK を使用して確認できます。

.show() を呼び出して要約指標と詳細な結果を表示することで、評価実行を取得して評価結果を可視化します。

evaluation_run = client.evals.get_evaluation_run(
    name=evaluation_run.name,
    include_evaluation_items=True
)

evaluation_run.show()

詳細な結果には、エージェントのインタラクションを示すトレースも含まれます。トレースの詳細については、エージェントをトレースするをご覧ください。

次のステップ

次のエージェント評価ノートブックを試す。