Tutorial: Perform evaluation using the console

Learn how to get started with Gen AI evaluation service using the Google Google Cloud console.

Before you begin

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. In the Google Cloud console, go to the project selector page.

    Go to project selector

  3. Select or create a Google Cloud project.

    Roles required to select or create a project

    • Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
    • Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.
  4. Verify that billing is enabled for your Google Cloud project.

  5. Make sure that you have the following role or roles on the project: Storage Admin

    Check for the roles

    1. In the Google Cloud console, go to the IAM page.

      Go to IAM
    2. Select the project.
    3. In the Principal column, find all rows that identify you or a group that you're included in. To learn which groups you're included in, contact your administrator.

    4. For all rows that specify or include you, check the Role column to see whether the list of roles includes the required roles.

    Grant the roles

    1. In the Google Cloud console, go to the IAM page.

      Go to IAM
    2. Select the project.
    3. Click Grant access.
    4. In the New principals field, enter your user identifier. This is typically the email address for a Google Account.

    5. Click Select a role, then search for the role.
    6. To grant additional roles, click Add another role and add each additional role.
    7. Click Save.
  6. In the Google Cloud console, go to the project selector page.

    Go to project selector

  7. Select or create a Google Cloud project.

    Roles required to select or create a project

    • Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
    • Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.
  8. Verify that billing is enabled for your Google Cloud project.

  9. Make sure that you have the following role or roles on the project: Storage Admin

    Check for the roles

    1. In the Google Cloud console, go to the IAM page.

      Go to IAM
    2. Select the project.
    3. In the Principal column, find all rows that identify you or a group that you're included in. To learn which groups you're included in, contact your administrator.

    4. For all rows that specify or include you, check the Role column to see whether the list of roles includes the required roles.

    Grant the roles

    1. In the Google Cloud console, go to the IAM page.

      Go to IAM
    2. Select the project.
    3. Click Grant access.
    4. In the New principals field, enter your user identifier. This is typically the email address for a Google Account.

    5. Click Select a role, then search for the role.
    6. To grant additional roles, click Add another role and add each additional role.
    7. Click Save.

Evaluate your model

To evaluate your model:

  1. In the Google Cloud console, go to the Gen AI Evaluation page.

    Go to Evaluation

  2. Click New evaluation to open the evaluation page.

  3. Select a source to load a dataset for evaluation:

    • To upload a local CSV or JSONL file, select Upload file. The dataset must contain either prompts or records to use in a prompt template and, optionally, model responses. The maximum is 200 rows.

    • To generate prompts from a prompt template, select Generate data. The Gen AI evaluation service generates and populates the variables that you defined in your prompt template when creating your dataset. For more information about authoring prompt templates, see Use prompt templates.

      1. Enter your prompt template with your variables in the Prompt template field.

      2. To add a description for each of your variables or to specify the number of samples to generate, expand Define variables and sample size.

      3. Click Generate dataset to generate prompts.

  4. Generate and evaluate responses based on your prompts:

    1. In the Evaluation candidates section, click Add evaluation candidate, or if a candidate already exists, click Edit to define the prompts and responses to evaluate. For example, you can specify prompts or responses from your uploaded file or from generated data.

    2. To compare multiple candidates, click Add comparison candidate.

    3. In the Metrics section, add at least one metric to score the quality of your candidate's responses. For more information about the metric types, see the Evaluation metrics section on the Gen AI evaluation service overview page.

    For some adaptive rubrics, you can steer the rubrics that are generated from each prompt by expanding Advanced and providing custom instructions, such as Evaluate the dataset on cultural sensitivity.

    1. In the Name and storage configuration section, specify a name for your evaluation and a Cloud Storage bucket where your evaluations results are stored.
  5. Click Evaluate.

View your evaluation results

To view an evaluation result:

  1. In the Google Cloud console, go to the GenAI Evaluation page.

    Go to Evaluation

  2. Click the evaluation name.

    For each prompt in your evaluation dataset, the response is shown along with the evaluation results.

Evaluate partner models

You can use Gen AI evaluation service to evaluate the following partner models:

  • Anthropic
  • Llama

Partner models are supported through Vertex AI Model Garden. You must enable a partner model in Model Garden before selecting it for evaluation. To evaluate a partner model, select it in the model selection menu during evaluation setup.

Pricing

Pricing for evaluating third-party models is based on any charges incurred for model inference in Vertex AI Model Garden. See the Pricing page for Generative AI on Vertex AI.

What's next