This document explains how to monitor the activity and health of your cluster in Cluster Director by using Cloud Logging audit logs.
Cluster Director integrates with Logging to provide detailed logs of your cluster operations and the Slurm environment. You can use these logs to monitor cluster operations, monitor the behavior of the Slurm scheduler, or troubleshoot issues. To learn more about using audit logs in Google Cloud, see Logging overview.
Before you begin
When you access and use the Google Cloud console, you don't need to authenticate. You can automatically use Google Cloud services and APIs.
Required roles
To get the permissions that
you need to view audit logs,
ask your administrator to grant you the
Logs Viewer (roles/logging.viewer)
IAM role on the project.
For more information about granting roles, see Manage access to projects, folders, and organizations.
You might also be able to get the required permissions through custom roles or other predefined roles.
View audit logs
To monitor the health and activity of your cluster, use one of the following log options:
To quickly track cluster creation, update, or delete operations view pre-configured audit logs.
To view detailed system activity in your cluster and Slurm environment, view filtered component logs.
View pre-configured audit logs
To view pre-configured audit logs for Cluster Director by using the Google Cloud console, complete the following steps:
In the Google Cloud console, go to the Cluster Director page.
In the navigation menu, click Clusters. The Clusters page appears.
In the Clusters table, in the Name column, click the name of the cluster that you want to view the details of. A page that gives the details of the cluster appears, and the Details tab is selected.
In the Compute section, in the Logging row, click View logs. The Logs explorer pages appears, and filters are automatically applied to display pre-configured audit logs for your cluster.
View custom component logs
To create and view logs for specific components of your cluster, such as controller or compute nodes, you can use filtering queries in the Logs Explorer by using the Google Cloud console. To do so, complete the following steps:
In the Google Cloud console, go to the Logs Explorer page.
If the Query pane isn't visible, click the Show query toggle to the on position.
In the Query pane, enter a query to filter your logs by. Based on the logs that you want to view, you can use one or more of the following queries:
View cluster operations logs: to track cluster create, update, or delete operations that are sent to the Cluster Director API, use one of the following queries:
To view a list of all cluster creation operations in your project, use the following query:
logName="projects/PROJECT_ID/logs/cloudaudit.googleapis.com%2Factivity" protoPayload.serviceName="hypercomputecluster.googleapis.com" protoPayload.methodName="google.cloud.hypercomputecluster.v1beta.HypercomputeCluster.CreateCluster"To view a list of all cluster update operations in your project, use the following query:
logName="projects/PROJECT_ID/logs/cloudaudit.googleapis.com%2Factivity" protoPayload.serviceName="hypercomputecluster.googleapis.com" protoPayload.methodName="google.cloud.hypercomputecluster.v1beta.HypercomputeCluster.UpdateCluster"To view a list of cluster update operations for a specific cluster, use the following query:
logName="projects/PROJECT_ID/logs/cloudaudit.googleapis.com%2Factivity" protoPayload.serviceName="hypercomputecluster.googleapis.com" protoPayload.methodName="google.cloud.hypercomputecluster.v1beta.HypercomputeCluster.UpdateCluster" protoPayload.resourceName="projects/PROJECT_ID/locations/REGION/clusters/CLUSTER_NAME"To view a list of cluster delete operations in your project, use the following query:
logName="projects/PROJECT_ID/logs/cloudaudit.googleapis.com%2Factivity" protoPayload.serviceName="hypercomputecluster.googleapis.com" protoPayload.methodName="google.cloud.hypercomputecluster.v1beta.HypercomputeCluster.DeleteCluster"
View cluster nodes logs: to view detailed system and Slurm activity from the cluster nodes, use one of the following queries:
To view the logs for a controller node, use the following query:
labels.clusterName = "CLUSTER_NAME" labels.hostname = "CLUSTER_NAME-controller"To view the logs for your nodeset nodes, use the following query:
labels.clusterName = "CLUSTER_NAME" labels.hostname = "CLUSTER_NAME-nodeset"
Replace the following:
PROJECT_ID: the ID of your project.REGION: the region where your cluster exists.CLUSTER_NAME: the name of your cluster.
For more information about writing queries for filtering logs, see Write advanced queries by using the Logging query language.