Before you begin
To ensure you can successfully simulate and evaluate agent behavior, complete the following:
- Create an agent and ensure it has at least one version.
- If you plan to use the Python SDK for simulation, initialize the Agent Platform SDK client as described in Evaluate your agents.
Simulation allows you to build a comprehensive evaluation suite from scratch, even without existing production data. This process uses LLMs to automatically generate test cases and then roleplay as a user to stress-test your agent's multi-turn conversational logic.
The 2-step simulation workflow
Testing a new agent typically follows a two-stage process:
- Generate Scenarios: Create a dataset of "test specs" based on your agent's instructions and tool definitions.
- Simulate Sessions: Execute those specs by having a simulated user interact with your agent to produce behavior traces. A trace is a factual, immutable record of the agent's behavior, including model inputs, responses, and tool calls.
In this stage, the system creates eval cases. An eval case is a specification that defines an agent's task. Each case consists of two elements:
- Starting Prompt: The first message a user sends to the agent.
- Conversation Plan: A hidden "instruction" for the simulated user, describing their goals and how they should react if the agent asks certain questions.
Generate scenarios in the console
In the Google Cloud console, navigate to the Agent Platform > Agents > Evaluation page.
Click New evaluation and select Simulate sessions.
Enter a Generation instruction to guide the scenarios (for example, "Generate scenarios where the user tries to book a flight but then changes their mind").
Review the generated table. You can edit the prompts or manually add your own test cases.
Step 2: Run user simulation
Once your scenarios are defined, the User Simulator acts as the user to drive the conversation forward.
Configure User Personas
To ensure diverse testing, you can assign specific Personas to the simulated user. This determines their tone, expertise, and traits:
- NOVICE: Asks basic questions and may need extra guidance.
- EXPERT: Uses technical jargon and expects efficient tool usage.
- IMPATIENT_USER: Becomes frustrated if the agent asks redundant questions or takes too many turns.
Simulation settings
When running the simulation, you can configure the following limits:
- Max Turns: The maximum number of back-and-forth exchanges allowed before the session ends (default is 5).
- Sampling Percentage: For large datasets, you can choose to simulate only a subset of cases.
SDK Example: Programmatic Simulation
You can also bootstrap your evaluation suite using the Agent Platform SDK:
# 1. Generate scenarios from agent info
eval_dataset = client.evals.generate_conversation_scenarios(
agent_info=my_agent_info,
config={
"count": 5,
"generation_instruction": "User wants a refund for a late flight.",
},
)
# 2. Simulate multi-turn interactions
traces = client.evals.run_inference(
agent=my_agent,
src=eval_dataset,
config={
"user_simulator_config": {
"max_turn": 5,
"user_persona": "IMPATIENT_USER"
}
}
)