Google uses AI technology to translate content into your preferred language. AI translations can contain errors.

教學課程：使用 Python SDK 執行評估

本頁說明如何使用 Agent Platform SDK for Python，透過 Gen AI Evaluation Service 執行以模型為基礎的評估。

事前準備

登入 Google Cloud 帳戶。如果您是 Google Cloud新手，歡迎建立帳戶，親自評估產品在實際工作環境中的成效。新客戶還能獲得價值 $300 美元的免費抵免額，可用於執行、測試及部署工作負載。
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Roles required to select or create a project
- Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
- Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.
Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.

Go to project selector

Verify that billing is enabled for your Google Cloud project.
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Roles required to select or create a project
- Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
- Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.
Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.

Go to project selector

Verify that billing is enabled for your Google Cloud project.
安裝 Agent Platform SDK for Python，並加入 Gen AI Evaluation Service 依附元件：
```
!pip install google-cloud-aiplatform[evaluation]
```
設定憑證。如果您在 Colaboratory 中執行本快速入門導覽課程，請執行下列操作：
```
from google.colab import auth
auth.authenticate_user()
```
如為其他環境，請參閱「向 Agent Platform 進行驗證」。

匯入程式庫

匯入程式庫，並設定專案和位置。

import pandas as pd

import vertexai
from vertexai.evaluation import EvalTask, PointwiseMetric, PointwiseMetricPromptTemplate
from google.cloud import aiplatform

PROJECT_ID = "PROJECT_ID"
LOCATION = "LOCATION"
EXPERIMENT_NAME = "EXPERIMENT_NAME"

vertexai.init(
    project=PROJECT_ID,
    location=LOCATION,
)

請注意，EXPERIMENT_NAME 最多只能包含 127 個字元，且只能使用小寫英數字元和連字號。

根據條件設定評估指標

下列指標定義會根據兩項條件 (Fluency 和 Entertaining)，評估大型語言模型生成的文字品質。程式碼會使用這兩項條件定義名為 custom_text_quality 的指標：

custom_text_quality = PointwiseMetric(
    metric="custom_text_quality",
    metric_prompt_template=PointwiseMetricPromptTemplate(
        criteria={
            "fluency": (
                "Sentences flow smoothly and are easy to read, avoiding awkward"
                " phrasing or run-on sentences. Ideas and sentences connect"
                " logically, using transitions effectively where needed."
            ),
            "entertaining": (
                "Short, amusing text that incorporates emojis, exclamations and"
                " questions to convey quick and spontaneous communication and"
                " diversion."
            ),
        },
        rating_rubric={
            "1": "The response performs well on both criteria.",
            "0": "The response is somewhat aligned with both criteria",
            "-1": "The response falls short on both criteria",
        },
    ),
)

準備資料集

新增下列程式碼來準備資料集：

responses = [
    # An example of good custom_text_quality
    "Life is a rollercoaster, full of ups and downs, but it's the thrill that keeps us coming back for more!",
    # An example of medium custom_text_quality
    "The weather is nice today, not too hot, not too cold.",
    # An example of poor custom_text_quality
    "The weather is, you know, whatever.",
]

eval_dataset = pd.DataFrame({
    "response" : responses,
})

使用資料集執行評估

執行評估作業：

eval_task = EvalTask(
    dataset=eval_dataset,
    metrics=[custom_text_quality],
    experiment=EXPERIMENT_NAME
)

pointwise_result = eval_task.evaluate()

在 metrics_table Pandas DataFrame 中查看每項回覆的評估結果：

pointwise_result.metrics_table

清除所用資源

為了避免系統向您的 Google Cloud 帳戶收取本頁面所用資源的費用，請按照下列步驟操作。

刪除評估作業建立的 ExperimentRun：

aiplatform.ExperimentRun(
    run_name=pointwise_result.metadata["experiment_run"],
    experiment=pointwise_result.metadata["experiment"],
).delete()

後續步驟

定義評估指標。
準備評估資料集。