View machine learning runs with ML Diagnostics

A machine learning run is a single, complete execution of a machine learning script or pipeline. With ML Diagnostics, you can view machine learning runs in the Google Cloud console using either the CLI or SDK.

To view all your machine learning runs in Cluster Director:

  1. In the Google Cloud console, go to the Cluster Director page.
  2. Click the Diagnostics tab.

Go to Cluster Director Diagnostics

To view all your machine learning runs in Google Kubernetes Engine:

  1. In the Google Cloud console, go to the Kubernetes page.
  2. In the navigation menu, click AI/ML.
  3. Click the Diagnostics tab.

Go to GKE AI/ML Diagnostics

In both Cluster Director and GKE, you will find the following information:

  • Run summaries: A list view table with summary information for all your machine learning runs.
  • Run details: Run details for each run, including configs and run information.
  • Time series charts for metrics: All metrics, including model metrics, performance metrics, and system metrics. You can also view these metrics with Logs Explorer. Metrics recorded with the metrics.record() method are written as log entries and can be filtered or used to create log-based metrics.
  • Profiling information: A Profiles tab with all profile sessions for a particular run, with links to the XProf viewer. This includes both programmatic and on-demand profile captures. You can also capture an on-demand profile session directly from the user interface.
  • On-demand Profiling from the Google Cloud console: Within the Profiles tab, you can capture an on-demand profile session directly from the Google Cloud console. Click the Capture new profile session button, specify the duration to capture for the profile session, and select the required hosts to capture the profile. The hosts running the workload are autopopulated in the table, without any manual entry required. After the specified on-demand capture time, the profile session shows up automatically in the Sessions table.