This document discusses tools and techniques you can use to monitor the health and performance of Managed Spark clusters and jobs.
Open source web interfaces
Many Managed Service for Apache Spark cluster open source components, such as Apache Hadoop and Apache Spark, provide web interfaces. These interfaces can be used to monitor cluster resources and job performance. For example, you can use the YARN Resource Manager UI to view YARN application resource allocation on a Managed Service for Apache Spark cluster.
Persistent History Server
Open Source web interfaces running on a cluster are available when the cluster is running, but they terminate when you delete the cluster. To view cluster and job data after a cluster is deleted, you can create a Persistent History Server (PHS).
Example: You encounter a job error or slowdown that you want to analyze. You stop or delete the job cluster, then view and analyze job history data using your PHS.
After you create a PHS, you enable it on a Managed Service for Apache Spark cluster or Managed Service for Apache Spark batch workload when you create the cluster or submit the batch workload. A PHS can access history data for jobs run on multiple clusters, letting you monitor jobs across a project instead of monitoring separate UIs running on different clusters.
Managed Service for Apache Spark logs
Managed Service for Apache Spark collects the logs generated by Apache Hadoop, Spark, Hive, ZooKeeper and other open source systems running on your clusters, and sends them to Logging. These logs are grouped based on the source of logs, which lets you select and view logs of interest to you: for example, YARN NodeManager and Spark Executor logs generated on a cluster are labeled separately. See Managed Service for Apache Spark logs for more information on Managed Service for Apache Spark log contents and options.
Cloud Logging
Logging is a fully-managed, real-time log management system. It provides storage for logs ingested from Google Cloud services and tools to search, filter, and analyze logs at scale. Managed Service for Apache Spark clusters generate multiple logs, including Managed Service for Apache Spark service agent logs, cluster startup logs, and OSS component logs, such as YARN NodeManager logs.
Logging is enabled by default on Managed Service for Apache Spark clusters and Managed Service for Apache Spark batch workloads. Logs are periodically exported to Logging, where they persist after the cluster is deleted or the workload is completed.
Managed Service for Apache Spark metrics
Managed Service for Apache Spark cluster and job
metrics,
prefixed with dataproc.googleapis.com/, consist of time-series data that
provide insights into the performance of a cluster, such as CPU utilization or
job status. Managed Service for Apache Spark custom
metrics, prefixed
with custom.googleapis.com/, include metrics emitted by open source systems
running on the cluster, such as the YARN running applications metric. Gaining
insight into Managed Service for Apache Spark metrics can help you configure your
clusters efficiently. Setting up metric-based alerts can help you recognize and
respond to problems quickly.
Managed Service for Apache Spark cluster and job metrics are collected by default without charge. The collection of custom metrics is charged to customers. You can enable the collection of custom metrics when you create a cluster. The collection of Managed Service for Apache Spark Spark metrics is enabled by default on Spark batch workloads.
Cloud Monitoring
Monitoring uses cluster metadata and metrics, including HDFS, YARN, job, and operation metrics, to provide visibility into the health, performance, and availability of Managed Service for Apache Spark clusters and jobs. You can use Monitoring to explore metrics, add charts, build dashboards, and create alerts.
Metrics Explorer
You can use the Metrics Explorer
to view Managed Service for Apache Spark
metrics. Managed Service for Apache Spark
cluster, job, and Managed Service for Apache Spark batch metrics are listed under
the Cloud Managed Service for Apache Spark Cluster, Cloud
Managed Service for Apache Spark Job, and Cloud Managed Service for Apache Spark
Batch resources. Managed Service for Apache Spark custom metrics are listed under
the VM Instances resource, Custom category.
Charts
You can use Metrics Explorer to create charts that visualize Managed Service for Apache Spark metrics.
Example: You create a chart to see the number of active Yarn applications running on your clusters, and then add a filter to select visualized metrics by cluster name or region.
Dashboards
You can build dashboards to monitor Managed Service for Apache Spark clusters and jobs using metrics from multiple projects and different Google Cloud products. You can build dashboards in the Google Cloud console from the Dashboards Overview page by clicking, creating, and then saving a chart from the Metrics Explorer page.
Alerts
You can create Managed Service for Apache Spark metric alerts to receive timely notice of cluster or job issues.
What's next
- Learn about AI-powered Investigations with Gemini Cloud Assist.
- Troubleshoot cluster creation issues.
- View cluster diagnostic data.