Continuous evaluation with online monitors

Online monitoring lets you continuously assess the quality of your agents in production. This proactive approach helps you identify quality drift—an observable decrease in agent performance over time—caused by changes in user behavior or external data. By configuring Online Monitors, you can asynchronously score live traces using both predefined and custom metrics, ensuring your agent remains reliable and aligned with your performance standards.

Before you begin

To enable online monitoring for your agents, ensure the following requirements are met:

  • Deploy your agent as described in Deploy an agent.
  • Ensure Cloud Trace is enabled for your project.
  • (Optional) If you plan to create monitors programmatically, see the Evaluate your agents page for Agent Platform SDK initialization instructions.

Telemetry requirements

Online monitoring requires your agent to export specific OpenTelemetry signals to provide the necessary context for evaluation:

  1. Invoke agent span: Must include the following attributes:

    • gen_ai.agent.name: The identifier for the agent.
    • gen_ai.agent.description: A brief description of the agent's purpose.
    • gen_ai.conversation.id: A unique identifier for the specific conversation session.
  2. Inference events: The gen_ai.client.inference.operation.details event must capture:

    • gen_ai.input.messages: The prompts sent to the agent.
    • gen_ai.output.messages: The responses generated by the agent.
    • gen_ai.system_instructions: The underlying system prompts.
    • gen_ai.tool.definitions: Metadata about any tools available to the agent.

If you are using the Agent Development Kit, you must enable these telemetry capabilities by setting the following environment variables:

OTEL_SEMCONV_STABILITY_OPT_IN='gen_ai_latest_experimental'
OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT='EVENT_ONLY'

Recording media in Cloud Storage

If your agent uses multimodal data, such as images or large documents, we recommend recording the inputs and outputs in a Cloud Storage bucket instead of embedding them directly in trace spans. Configure the following environment variables to enable this:

OTEL_INSTRUMENTATION_GENAI_UPLOAD_FORMAT='jsonl'
OTEL_INSTRUMENTATION_GENAI_COMPLETION_HOOK='upload'
OTEL_INSTRUMENTATION_GENAI_UPLOAD_BASE_PATH='gs://STORAGE_BUCKET_NAME/PATH'

For more information, see Collect multimodal prompts and responses.

How online monitors work

Online Monitors run on a scheduled evaluation loop, typically every 10 minutes. The loop follows these steps:

  1. Query: Samples data from Cloud Trace and Cloud Logging based on your filters.
  2. Evaluate: Runs configured metrics using the Gemini Enterprise Agent Platform Evaluation Service.
  3. Report: Writes results back to Cloud Logging and exports numeric scores to Cloud Monitoring.

Create an online monitor

  1. In the Google Cloud console, navigate to the Agent Platform > Agents > Evaluation page.

    Go to Evaluation

  2. Select the Online monitors tab and click New monitor.

  3. Specify Filter Traces:

    • Agent engine: Select the agent you want to monitor from the dropdown.
    • Filter criteria: Choose whether to evaluate All traces for the agent or apply specific Filter criteria.
  4. Define Filter Criteria (if using filtered traces):

    • Initial Inspection: Select a timeframe (for example, Last 1 day) to preview the production traces that your filter matches.
    • Filters: Enter criteria to target specific traffic. You can filter by properties such as Duration (for example, Duration > 2) or Token usage.
  5. Configure Metrics: Add the metrics you want to track continuously, such as Safety.

  6. Set Sampling:

    • Sampling percentage: Define what percentage of your live traffic should be evaluated.
    • Max samples per run: Set a cap to manage evaluation costs.
  7. Click Create.

Manage monitors

Once you create a monitor, you can manage it from the Online monitors list:

  • Status Toggle: Click More options and select Enable or Disable to pause evaluation without deleting the configuration.
  • Pause and Resume: Use More options to temporarily stop evaluation.
  • Duplicate: Create a new monitor with pre-filled settings from an existing one.
  • View Traces: Click the View traces link in the Sampled traces column for a monitor to navigate directly to the filtered traces in the agent's Traces tab.

View results in the observability dashboard

To see your evaluation metrics alongside other performance signals:

  1. In the Google Cloud console, navigate to the Agent Platform > Agents page.
  2. In the left navigation menu, select Deployments.
  3. Select your agent.

    Go to Deployments

  4. In the Dashboard view, select the Evaluation subsection to view time-series charts for your configured metrics, such as response quality, safety, and hallucination rates.

Troubleshoot online monitors

If your Online Monitor is active but no results appear in your dashboard:

  1. Verify Telemetry: Ensure your agent is correctly exporting the required OpenTelemetry spans and events. Check Cloud Trace to see if live traces contain the gen_ai. attributes.
  2. Check Filters: Review your monitor's filter criteria. Use the Initial Inspection feature to confirm that your filters match your production traffic.
  3. Check Internal Logs: Online Monitors write diagnostic information to Cloud Logging. You can find these logs by searching for your monitor ID in the Logs Explorer: sh resource.type="aiplatform.googleapis.com/OnlineEvaluator" resource.labels.online_evaluator_id="YOUR_MONITOR_ID"