教程:使用控制台执行评估

了解如何使用 Google Google Cloud 控制台开始使用 Gen AI Evaluation Service。

准备工作

  1. 登录您的 Google Cloud 账号。如果您是 Google Cloud新手, 请创建一个账号来评估我们的产品在 实际场景中的表现。新客户还可获享 $300 赠金,用于 运行、测试和部署工作负载。
  2. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Roles required to select or create a project

    • Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
    • Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

    Go to project selector

  3. Verify that billing is enabled for your Google Cloud project.

  4. Make sure that you have the following role or roles on the project: Storage Admin

    Check for the roles

    1. In the Google Cloud console, go to the IAM page.

      Go to IAM
    2. Select the project.
    3. In the Principal column, find all rows that identify you or a group that you're included in. To learn which groups you're included in, contact your administrator.

    4. For all rows that specify or include you, check the Role column to see whether the list of roles includes the required roles.

    Grant the roles

    1. In the Google Cloud console, go to the IAM page.

      Go to IAM
    2. Select the project.
    3. Click Grant access.
    4. In the New principals field, enter your user identifier. This is typically the email address for a Google Account.

    5. Click Select a role, then search for the role.
    6. To grant additional roles, click Add another role and add each additional role.
    7. Click Save.
  5. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Roles required to select or create a project

    • Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
    • Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

    Go to project selector

  6. Verify that billing is enabled for your Google Cloud project.

  7. Make sure that you have the following role or roles on the project: Storage Admin

    Check for the roles

    1. In the Google Cloud console, go to the IAM page.

      Go to IAM
    2. Select the project.
    3. In the Principal column, find all rows that identify you or a group that you're included in. To learn which groups you're included in, contact your administrator.

    4. For all rows that specify or include you, check the Role column to see whether the list of roles includes the required roles.

    Grant the roles

    1. In the Google Cloud console, go to the IAM page.

      Go to IAM
    2. Select the project.
    3. Click Grant access.
    4. In the New principals field, enter your user identifier. This is typically the email address for a Google Account.

    5. Click Select a role, then search for the role.
    6. To grant additional roles, click Add another role and add each additional role.
    7. Click Save.

评估模型

如需评估模型,请执行以下操作:

  1. 在 Google Cloud 控制台中,前往 Gen AI Evaluation 页面。

    前往“评估”

  2. 点击新评估 以打开评估页面。

  3. 选择一个来源以加载数据集进行评估:

    • 如需上传本地 CSV 或 JSONL 文件,请选择上传文件 。数据集必须包含提示或要在提示模板中使用的记录,并且可以选择性地包含模型回答。最多 200 行。

    • 如需根据提示模板生成提示,请选择生成数据 。 Gen AI Evaluation Service 会在您创建数据集时生成并填充在提示模板中定义的变量。如需详细了解如何编写提示模板,请参阅使用提示 模板

      1. 提示模板 字段中输入包含变量的提示模板。

      2. 如需为每个变量添加说明或指定要生成的样本数量,请展开定义变量和样本大小

      3. 点击生成数据集 以生成提示。

  4. 根据提示生成并评估回答:

    1. 评估候选对象 部分中,点击添加评估 候选对象 ,或者,如果候选对象已存在,请点击 修改 以定义要评估的提示和 回答。例如,您可以指定来自上传文件或生成数据的提示或回答。

    2. 如需比较多个候选对象,请点击添加比较候选对象

    3. 指标 部分中,至少添加一个指标来对候选对象回答的质量进行评分。如需详细了解指标类型,请参阅 Gen AI Evaluation Service 概览页面上的 评估指标 部分。

    对于某些自适应评分标准,您可以通过展开高级 并提供自定义说明(例如 Evaluate the dataset on cultural sensitivity)来控制根据每个提示生成的评分标准。

    1. 名称和存储配置 部分中,为评估指定名称,并指定用于存储评估结果的 Cloud Storage 存储桶。
  5. 点击评估

查看评估结果

如需查看评估结果,请执行以下操作:

  1. 在 Google Cloud 控制台中,前往 GenAI Evaluation 页面。

    前往“评估”

  2. 点击评估名称。

    对于评估数据集中的每个提示,系统都会显示回答以及评估结果。

评估合作伙伴模型

您可以使用 Gen AI Evaluation Service 评估以下合作伙伴模型:

  • Anthropic
  • Llama

合作伙伴模型通过 Vertex AI Model Garden 提供支持。您必须先在 Model Garden 中启用合作伙伴模型,然后才能选择该模型进行评估。如需评估合作伙伴模型,请在评估设置期间在模型选择菜单中选择该模型。

价格

评估第三方模型的价格取决于在 Vertex AI Model Garden 中进行模型推理时产生的任何费用。请参阅 Vertex AI 上的生成式 AI 的价格页面

后续步骤