本文档介绍了 Spark 指标。默认情况下, Serverless for Apache Spark 会启用 可用 Spark 指标的收集,除非您使用 Spark 指标收集属性 来停用或替换一个或多个 Spark 指标的收集。
如需了解在提交 Serverless for Apache Spark Spark 批量工作负载时可以设置的其他属性,请参阅 Spark 属性
Spark 指标收集属性
您可以使用本部分列出的属性来停用或替换 一个或多个 可用 Spark 指标的收集。
| 属性 | 说明 |
|---|---|
spark.dataproc.driver.metrics |
用于停用或替换 Spark 驱动程序指标。 |
spark.dataproc.executor.metrics |
用于停用或替换 Spark 执行程序指标。 |
spark.dataproc.system.metrics |
用于停用 Spark 系统指标。 |
gcloud CLI 示例:
停用 Spark 驱动程序指标收集:
gcloud dataproc batches submit spark \ --properties spark.dataproc.driver.metrics="" \ --region=region \ other args ...
替换 Spark 默认驱动程序指标收集,以仅收集
BlockManager:disk.diskSpaceUsed_MB和DAGScheduler:stage.failedStages指标:gcloud dataproc batches submit spark \ --properties=^~^spark.dataproc.driver.metrics="BlockManager:disk.diskSpaceUsed_MB,DAGScheduler:stage.failedStages" \ --region=region \ other args ...
可用 Spark 指标
Serverless for Apache Spark 会收集本部分列出的 Spark 指标 除非您使用 Spark 指标收集属性 来停用或替换其收集。
custom.googleapis.com/METRIC_EXPLORER_NAME。
Spark 驱动程序指标
| 指标 | Metrics Explorer 名称 |
|---|---|
| BlockManager:disk.diskSpaceUsed_MB | spark/driver/BlockManager/disk/diskSpaceUsed_MB |
| BlockManager:memory.maxMem_MB | spark/driver/BlockManager/memory/maxMem_MB |
| BlockManager:memory.memUsed_MB | spark/driver/BlockManager/memory/memUsed_MB |
| DAGScheduler:job.activeJobs | spark/driver/DAGScheduler/job/activeJobs |
| DAGScheduler:job.allJobs | spark/driver/DAGScheduler/job/allJobs |
| DAGScheduler:messageProcessingTime | spark/driver/DAGScheduler/messageProcessingTime |
| DAGScheduler:stage.failedStages | spark/driver/DAGScheduler/stage/failedStages |
| DAGScheduler:stage.runningStages | spark/driver/DAGScheduler/stage/runningStages |
| DAGScheduler:stage.waitingStages | spark/driver/DAGScheduler/stage/waitingStages |
Spark 执行程序指标
| 指标 | Metrics Explorer 名称 |
|---|---|
| ExecutorAllocationManager:executors.numberExecutorsDecommissionUnfinished | spark/driver/ExecutorAllocationManager/executors/numberExecutorsDecommissionUnfinished |
| ExecutorAllocationManager:executors.numberExecutorsExitedUnexpectedly | spark/driver/ExecutorAllocationManager/executors/numberExecutorsExitedUnexpectedly |
| ExecutorAllocationManager:executors.numberExecutorsGracefullyDecommissioned | spark/driver/ExecutorAllocationManager/executors/numberExecutorsGracefullyDecommissioned |
| ExecutorAllocationManager:executors.numberExecutorsKilledByDriver | spark/driver/ExecutorAllocationManager/executors/numberExecutorsKilledByDriver |
| LiveListenerBus:queue.executorManagement.listenerProcessingTime | spark/driver/LiveListenerBus/queue/executorManagement/listenerProcessingTime |
| executor:bytesRead | spark/executor/bytesRead |
| executor:bytesWritten | spark/executor/bytesWritten |
| executor:cpuTime | spark/executor/cpuTime |
| executor:diskBytesSpilled | spark/executor/diskBytesSpilled |
| executor:jvmGCTime | spark/executor/jvmGCTime |
| executor:memoryBytesSpilled | spark/executor/memoryBytesSpilled |
| executor:recordsRead | spark/executor/recordsRead |
| executor:recordsWritten | spark/executor/recordsWritten |
| executor:runTime | spark/executor/runTime |
| executor:shuffleFetchWaitTime | spark/executor/shuffleFetchWaitTime |
| executor:shuffleRecordsRead | spark/executor/shuffleRecordsRead |
| executor:shuffleRecordsWritten | spark/executor/shuffleRecordsWritten |
| executor:shuffleRemoteBytesReadToDisk | spark/executor/shuffleRemoteBytesReadToDisk |
| executor:shuffleWriteTime | spark/executor/shuffleWriteTime |
| executor:succeededTasks | spark/executor/succeededTasks |
| ExecutorMetrics:MajorGCTime | spark/executor/ExecutorMetrics/MajorGCTime |
| ExecutorMetrics:MinorGCTime | spark/executor/ExecutorMetrics/MinorGCTime |
系统指标
| 指标 | 指标浏览器名称 |
|---|---|
| agent:uptime | agent/uptime |
| cpu:utilization | cpu/utilization |
| disk:bytes_used | disk/bytes_used |
| disk:percent_used | disk/percent_used |
| memory:bytes_used | memory/bytes_used |
| memory:percent_used | memory/percent_used |
| network:tcp_connections | network/tcp_connections |
查看 Spark 指标
如需查看批量指标,请点击控制台中的 Dataproc 批量 页面上的批量 ID,以打开批量详细信息 页面, 该页面会在监控标签页 下显示批量工作负载的指标图表。Google Cloud
如需详细了解如何查看收集的指标,请参阅 Dataproc Cloud Monitoring 。