Google uses AI technology to translate content into your preferred language. AI translations can contain errors.

튜토리얼: Agent Platform SDK의 생성형 AI 클라이언트를 사용하여 평가 수행

이 페이지에서는 Agent Platform SDK의 생성형 AI 클라이언트를 사용하여 다양한 사용 사례에서 생성형 AI 모델과 애플리케이션을 평가하는 방법을 보여줍니다.

시작하기 전에

Google Cloud 계정에 로그인합니다. Google Cloud를 처음 사용하는 경우 계정을 만들고 Google 제품의 실제 성능을 평가해 보세요. 신규 고객에게는 워크로드를 실행, 테스트, 배포하는 데 사용할 수 있는 $300의 무료 크레딧이 제공됩니다.
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Roles required to select or create a project
- Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
- Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.
Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.

Go to project selector

Verify that billing is enabled for your Google Cloud project.
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Roles required to select or create a project
- Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
- Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.
Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

Agent Platform SDK를 설치합니다.

!pip install google-cloud-aiplatform[evaluation]

사용자 인증 정보를 설정합니다. Colaboratory에서 이 튜토리얼을 실행하는 경우 다음을 실행합니다.
```
from google.colab import auth
auth.authenticate_user()
```
다른 환경의 경우 Agent Platform에 인증을 참고하세요.

생성형 AI 클라이언트 초기화

생성형 AI 클라이언트를 초기화하려면 다음을 실행합니다.

from vertexai import Client

client = Client(project="YOUR_PROJECT_ID", location="YOUR_LOCATION")

각 항목의 의미는 다음과 같습니다.

YOUR_PROJECT_ID: Google Cloud 프로젝트 ID입니다.
YOUR_LOCATION: 클라우드 리전(예: us-central1)

응답 생성

run_inference()를 사용하여 데이터 세트의 모델 응답을 생성합니다.

데이터 세트를 Pandas DataFrame으로 준비합니다.

import pandas as pd

eval_df = pd.DataFrame({
  "prompt": [
      "Explain software 'technical debt' using a concise analogy of planting a garden.",
      "Write a Python function to find the nth Fibonacci number using recursion with memoization, but without using any imports.",
      "Write a four-line poem about a lonely robot, where every line must be a question and the word 'and' cannot be used.",
      "A drawer has 10 red socks and 10 blue socks. In complete darkness, what is the minimum number of socks you must pull out to guarantee you have a matching pair?",
      "An AI discovers a cure for a major disease, but the cure is based on private data it analyzed without consent. Should the cure be released? Justify your answer."
  ]
})

run_inference()를 사용하여 모델 응답을 생성합니다.

eval_dataset = client.evals.run_inference(
  model="gemini-2.5-flash",
  src=eval_df,
)

EvaluationDataset 객체에서 .show()를 호출하여 추론 결과를 시각화하여 모델의 출력과 함께 원래 프롬프트 및 참조를 검사합니다.
```
eval_dataset.show()
```

다음 이미지는 프롬프트와 이에 상응하는 생성된 응답이 포함된 평가 데이터 세트를 보여줍니다.

프롬프트와 응답 열이 있는 평가 데이터 세트를 보여주는 테이블

평가 실행

evaluate()를 실행하여 모델 응답을 평가합니다.

기본 GENERAL_QUALITY 적응형 기준표 기반 측정항목을 사용하여 모델 응답을 평가합니다.
```
eval_result = client.evals.evaluate(dataset=eval_dataset)
```
EvaluationResult 객체에서 .show()를 호출하여 요약 측정항목과 세부 결과를 표시하여 평가 결과를 시각화합니다.
```
eval_result.show()
```

다음 이미지는 요약 측정항목과 각 프롬프트-응답 쌍의 세부 결과를 보여주는 평가 보고서를 표시합니다.

각 프롬프트-응답 쌍의 세부 결과와 함께 요약 측정항목을 표시하는 평가 보고서

삭제

이 튜토리얼에서는 Gemini Enterprise Agent Platform 리소스를 만들지 않습니다.

다음 단계

평가 측정항목 정의