이 페이지는 Cloud Translation API를 통해 번역되었습니다.

Vertex AI SDK의 생성형 AI 클라이언트를 사용하여 에이전트 평가

Gen AI Evaluation Service를 사용하여 특정 사용 사례의 태스크와 목표를 완료하는 에이전트의 기능을 평가할 수 있습니다.

이 페이지에서는 기본 에이전트를 만들고 배포하는 방법과 Gen AI Evaluation Service를 사용하여 에이전트를 평가하는 방법을 보여줍니다.

에이전트 개발: 기본 도구 함수로 에이전트를 정의합니다.
에이전트 배포: Vertex AI Agent Engine 런타임에 에이전트를 배포합니다.
에이전트 추론 실행: 평가 데이터 세트를 정의하고 에이전트 추론을 실행하여 응답을 생성합니다.
평가 실행 생성: 평가를 실행하기 위한 평가 실행을 생성합니다.
평가 결과 보기: 평가 실행을 통해 평가 결과를 확인합니다.

시작하기 전에

Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Roles required to select or create a project
- Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
- Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.
Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.

Go to project selector

Verify that billing is enabled for your Google Cloud project.
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Roles required to select or create a project
- Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
- Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.
Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

Vertex AI SDK for Python을 설치합니다.

%pip install google-cloud-aiplatform[adk,agent_engines]
%pip install --upgrade --force-reinstall -q google-cloud-aiplatform[evaluation]

사용자 인증 정보를 설정합니다. Colaboratory에서 이 튜토리얼을 실행하는 경우 다음을 실행합니다.
```
from google.colab import auth
auth.authenticate_user()
```
다른 환경의 경우 Vertex AI에 인증을 참고하세요.

Vertex AI SDK에서 생성형 AI 클라이언트를 초기화합니다.

import vertexai
from vertexai import Client
from google.genai import types as genai_types

GCS_DEST = "gs://BUCKET_NAME/output-path"
vertexai.init(
    project=PROJECT_ID,
    location=LOCATION,
)

client = Client(
    project=PROJECT_ID,
    location=LOCATION,
    http_options=genai_types.HttpOptions(api_version="v1beta1"),
  )

다음을 바꿉니다.

BUCKET_NAME: Cloud Storage 버킷 이름. 버킷 만들기에 대한 자세한 내용은 버킷 만들기를 참조하세요.
PROJECT_ID: 프로젝트 ID
LOCATION: 선택한 리전

에이전트 개발

모델, 안내, 도구 세트를 정의하여 에이전트 개발 키트(ADK) 에이전트를 개발합니다. 에이전트 개발에 관한 자세한 내용은 에이전트 개발 키트 에이전트 개발을 참조하세요.

from google.adk import Agent

# Define Agent Tools
def search_products(query: str):
    """Searches for products based on a query."""
    # Mock response for demonstration
    if "headphones" in query.lower():
        return {"products": [{"name": "Wireless Headphones", "id": "B08H8H8H8H"}]}
    else:
        return {"products": []}

def get_product_details(product_id: str):
    """Gets the details for a given product ID."""
    if product_id == "B08H8H8H8H":
        return {"details": "Noise-cancelling, 20-hour battery life."}
    else:
        return {"error": "Product not found."}

def add_to_cart(product_id: str, quantity: int):
    """Adds a specified quantity of a product to the cart."""
    return {"status": f"Added {quantity} of {product_id} to cart."}

# Define Agent
my_agent = Agent(
    model="gemini-2.5-flash",
    name='ecommerce_agent',
    instruction='You are an ecommerce expert',
    tools=[search_products, get_product_details, add_to_cart],
)

에이전트 배포

Vertex AI Agent Engine 런타임에 에이전트를 배포합니다. 최대 10분이 걸릴 수 있습니다. 배포된 에이전트에서 리소스 이름을 검색합니다.

def deploy_adk_agent(root_agent):
  """Deploy agent to agent engine.
  Args:
    root_agent: The ADK agent to deploy.
  """
  app = vertexai.agent_engines.AdkApp(
      agent=root_agent,
  )
  remote_app = client.agent_engines.create(
      agent=app,
      config = {
          "staging_bucket": gs://BUCKET_NAME,
          "requirements": ['google-cloud-aiplatform[adk,agent_engines]'],
          "env_vars": {"GOOGLE_CLOUD_AGENT_ENGINE_ENABLE_TELEMETRY": "true"}
      }
  )
  return remote_app

agent_engine = deploy_adk_agent(my_agent)
agent_engine_resource_name = agent_engine.api_resource.name

Vertex AI Agent Engine에 배포된 에이전트 목록을 가져오려면 배포된 에이전트 관리를 참조하세요.

응답 생성

run_inference()를 사용하여 데이터 세트의 모델 응답을 생성합니다.

Pandas DataFrame으로 데이터 세트를 준비합니다. 프롬프트는 에이전트에 맞게 구체적이어야 합니다. Trace에는 세션 입력이 필요합니다. 자세한 내용은 세션: 개별 대화 추적을 참조하세요.

import pandas as pd
from vertexai import types

session_inputs = types.evals.SessionInput(
    user_id="user_123",
    state={},
)
agent_prompts = [
    "Search for 'noise-cancelling headphones'.",
    "Show me the details for product 'B08H8H8H8H'.",
    "Add one pair of 'B08H8H8H8H' to my shopping cart.",
    "Find 'wireless earbuds' and then add the first result to my cart.",
    "I need a new laptop for work, can you find one with at least 16GB of RAM?",
]
agent_dataset = pd.DataFrame({
    "prompt": agent_prompts,
    "session_inputs": [session_inputs] * len(agent_prompts),
})

run_inference()를 사용하여 모델 응답을 생성합니다.

agent_dataset_with_inference = client.evals.run_inference(
    agent=agent_engine_resource_name,
    src=agent_dataset,
)

EvaluationDataset 객체에서 .show()를 호출하여 추론 결과를 시각화하고 모델의 출력과 함께 원래 프롬프트 및 참조를 검사합니다.
```
agent_dataset_with_inference.show()
```
다음 이미지는 프롬프트와 이에 상응하는 생성된 intermediate_events 및 responses가 포함된 평가 데이터 세트를 보여줍니다.

에이전트 평가 실행

create_evaluation_run() 실행을 통해 에이전트 응답을 평가합니다.

기본 제공 도우미 함수를 사용하여 agent_info를 검색합니다.

agent_info = types.evals.AgentInfo.load_from_agent(
    my_agent,
    agent_engine_resource_name
)

에이전트별 적응형 기준표 기반 측정항목(FINAL_RESPONSE_QUALITY, TOOL_USE_QUALITY, HALLUCINATION)을 사용하여 모델 응답을 평가합니다.

evaluation_run = client.evals.create_evaluation_run(
    dataset=agent_dataset_with_inference,
    agent_info=agent_info,
    metrics=[
        types.RubricMetric.FINAL_RESPONSE_QUALITY,
        types.RubricMetric.TOOL_USE_QUALITY,
        types.RubricMetric.HALLUCINATION,
        types.RubricMetric.SAFETY,
    ],
    dest=GCS_DEST,
)

에이전트 평가 결과 보기

Vertex AI SDK를 사용하여 평가 결과를 볼 수 있습니다.

.show()를 호출하여 평가 실행을 가져오고 요약 측정항목과 세부 결과를 표시하여 평가 결과를 시각화합니다.

evaluation_run = client.evals.get_evaluation_run(
    name=evaluation_run.name,
    include_evaluation_items=True
)

evaluation_run.show()

다음 이미지는 요약 측정항목, 에이전트 정보, 각 프롬프트-응답 쌍의 세부 결과를 보여주는 평가 보고서를 표시합니다. 자세한 결과에는 에이전트 상호작용을 보여주는 trace도 포함됩니다. Trace에 관한 자세한 내용은 에이전트 trace를 참조하세요.

에이전트 평가 결과

다음 단계

다음 노트북을 사용해 보세요.