View machine learning runs with ML Diagnostics
A machine learning run is a single, complete execution of a machine learning script or pipeline. With ML Diagnostics, you can view machine learning runs in the Google Cloud console using either the CLI or SDK.
To view all your machine learning runs in Cluster Director:
- In the Google Cloud console, go to the Cluster Director page.
- Click the Diagnostics tab.
Go to Cluster Director Diagnostics
To view all your machine learning runs in Google Kubernetes Engine:
- In the Google Cloud console, go to the Kubernetes page.
- In the navigation menu, click AI/ML.
- Click the Diagnostics tab.
In both Cluster Director and GKE, you will find the following information:
- Run summaries: A list view table with summary information for all your machine learning runs.
- Run details: Run details for each run, including configs and run information.
- Time series charts for metrics: All metrics, including model metrics,
performance metrics, and system metrics. You can also view these metrics
with Logs Explorer. Metrics recorded with the
metrics.record()method are written as log entries and can be filtered or used to create log-based metrics. - Profiling information: A Profiles tab with all profile sessions for a particular run, with links to the XProf viewer. This includes both programmatic and on-demand profile captures. You can also capture an on-demand profile session directly from the user interface.
- On-demand Profiling from the Google Cloud console: Within the
Profiles tab, you can capture an on-demand profile session directly from
the Google Cloud console. Click the Capture new profile session
button, specify the duration to capture for the profile session, and select
the required hosts to capture the profile. The hosts running the workload
are autopopulated in the table, without any manual entry required. After the
specified on-demand capture time, the profile session shows up automatically
in the Sessions table.