Google uses AI technology to translate content into your preferred language. AI translations can contain errors.

チュートリアル: Agent Platform SDK の生成 AI クライアントを使用して評価を行う

このページでは、Agent Platform SDK の生成 AI クライアントを使用して、さまざまなユースケースで生成 AI モデルとアプリケーションを評価する方法について説明します。

始める前に

アカウントにログインしてください。 Google Cloud を初めて使用する場合は、アカウントを作成して、実際のシナリオで Google プロダクトのパフォーマンスを評価してください。 Google Cloud新規のお客様には、ワークロードの実行、テスト、デプロイができる無料クレジット $300 分を差し上げます。
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Roles required to select or create a project
- Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
- Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.
Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.

Go to project selector

Verify that billing is enabled for your Google Cloud project.
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Roles required to select or create a project
- Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
- Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.
Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

Agent Platform SDK をインストールします。

!pip install google-cloud-aiplatform[evaluation]

認証情報を設定します。このチュートリアルを Colaboratory で実行している場合は、次のコマンドを実行します。
```
from google.colab import auth
auth.authenticate_user()
```
他の環境については、Agent Platform に対する認証をご覧ください。

生成 AI クライアントを初期化する

生成 AI クライアントを初期化するには、次のコマンドを実行します。

from vertexai import Client

client = Client(project="YOUR_PROJECT_ID", location="YOUR_LOCATION")

ここで

YOUR_PROJECT_ID: 実際の Google Cloud プロジェクト ID。
YOUR_LOCATION: クラウドリージョン（例: us-central1）。

レスポンスを生成する

run_inference() を使用して、データセットのモデルレスポンスを生成します。

Pandas DataFrame としてデータセットを準備します。

import pandas as pd

eval_df = pd.DataFrame({
  "prompt": [
      "Explain software 'technical debt' using a concise analogy of planting a garden.",
      "Write a Python function to find the nth Fibonacci number using recursion with memoization, but without using any imports.",
      "Write a four-line poem about a lonely robot, where every line must be a question and the word 'and' cannot be used.",
      "A drawer has 10 red socks and 10 blue socks. In complete darkness, what is the minimum number of socks you must pull out to guarantee you have a matching pair?",
      "An AI discovers a cure for a major disease, but the cure is based on private data it analyzed without consent. Should the cure be released? Justify your answer."
  ]
})

run_inference() を使用してモデルのレスポンスを生成します。

eval_dataset = client.evals.run_inference(
  model="gemini-2.5-flash",
  src=eval_df,
)

EvaluationDataset オブジェクトで .show() を呼び出して、元のプロンプトと参照とともにモデルの出力を検査し、推論結果を可視化します。
```
eval_dataset.show()
```

次の画像は、プロンプトとそれに対応する生成されたレスポンスを含む評価データセットを示しています。

プロンプトとレスポンスの列を含む評価データセットを示すテーブル。

評価を実行する

evaluate() を実行して、モデルのレスポンスを評価します。

デフォルトの GENERAL_QUALITY 適応型ルーブリックベースの指標を使用して、モデルのレスポンスを評価します。
```
eval_result = client.evals.evaluate(dataset=eval_dataset)
```
EvaluationResult オブジェクトで .show() を呼び出して、要約指標と詳細な結果を表示することで、評価結果を可視化します。
```
eval_result.show()
```

次の画像は、評価レポートを示しています。このレポートには、プロンプト / レスポンスペアのそれぞれの要約指標と詳細な結果が表示されます。

プロンプト / レスポンスペアのそれぞれの詳細な結果とともに、要約指標が表示された評価レポート。

クリーンアップ

このチュートリアルでは、Gemini Enterprise Agent Platform のリソースは作成されません。

次のステップ

評価指標を定義する。