Spark 指标

本文档介绍了 Spark 指标。默认情况下, Serverless for Apache Spark 会启用 可用 Spark 指标的收集,除非您使用 Spark 指标收集属性 来停用或替换一个或多个 Spark 指标的收集。

如需了解在提交 Serverless for Apache Spark Spark 批量工作负载时可以设置的其他属性,请参阅 Spark 属性

Spark 指标收集属性

您可以使用本部分列出的属性来停用或替换 一个或多个 可用 Spark 指标的收集。

属性 说明
spark.dataproc.driver.metrics 用于停用或替换 Spark 驱动程序指标
spark.dataproc.executor.metrics 用于停用或替换 Spark 执行程序指标
spark.dataproc.system.metrics 用于停用 Spark 系统指标

gcloud CLI 示例:

  • 停用 Spark 驱动程序指标收集:

    gcloud dataproc batches submit spark \
        --properties spark.dataproc.driver.metrics="" \
        --region=region \
        other args ...
    
  • 替换 Spark 默认驱动程序指标收集,以仅收集 BlockManager:disk.diskSpaceUsed_MBDAGScheduler:stage.failedStages 指标:

    gcloud dataproc batches submit spark \
        --properties=^~^spark.dataproc.driver.metrics="BlockManager:disk.diskSpaceUsed_MB,DAGScheduler:stage.failedStages" \
        --region=region \
        other args ...
    

可用 Spark 指标

Serverless for Apache Spark 会收集本部分列出的 Spark 指标 除非您使用 Spark 指标收集属性 来停用或替换其收集。

custom.googleapis.com/METRIC_EXPLORER_NAME

Spark 驱动程序指标

指标 Metrics Explorer 名称
BlockManager:disk.diskSpaceUsed_MB spark/driver/BlockManager/disk/diskSpaceUsed_MB
BlockManager:memory.maxMem_MB spark/driver/BlockManager/memory/maxMem_MB
BlockManager:memory.memUsed_MB spark/driver/BlockManager/memory/memUsed_MB
DAGScheduler:job.activeJobs spark/driver/DAGScheduler/job/activeJobs
DAGScheduler:job.allJobs spark/driver/DAGScheduler/job/allJobs
DAGScheduler:messageProcessingTime spark/driver/DAGScheduler/messageProcessingTime
DAGScheduler:stage.failedStages spark/driver/DAGScheduler/stage/failedStages
DAGScheduler:stage.runningStages spark/driver/DAGScheduler/stage/runningStages
DAGScheduler:stage.waitingStages spark/driver/DAGScheduler/stage/waitingStages

Spark 执行程序指标

指标 Metrics Explorer 名称
ExecutorAllocationManager:executors.numberExecutorsDecommissionUnfinished spark/driver/ExecutorAllocationManager/executors/numberExecutorsDecommissionUnfinished
ExecutorAllocationManager:executors.numberExecutorsExitedUnexpectedly spark/driver/ExecutorAllocationManager/executors/numberExecutorsExitedUnexpectedly
ExecutorAllocationManager:executors.numberExecutorsGracefullyDecommissioned spark/driver/ExecutorAllocationManager/executors/numberExecutorsGracefullyDecommissioned
ExecutorAllocationManager:executors.numberExecutorsKilledByDriver spark/driver/ExecutorAllocationManager/executors/numberExecutorsKilledByDriver
LiveListenerBus:queue.executorManagement.listenerProcessingTime spark/driver/LiveListenerBus/queue/executorManagement/listenerProcessingTime
executor:bytesRead spark/executor/bytesRead
executor:bytesWritten spark/executor/bytesWritten
executor:cpuTime spark/executor/cpuTime
executor:diskBytesSpilled spark/executor/diskBytesSpilled
executor:jvmGCTime spark/executor/jvmGCTime
executor:memoryBytesSpilled spark/executor/memoryBytesSpilled
executor:recordsRead spark/executor/recordsRead
executor:recordsWritten spark/executor/recordsWritten
executor:runTime spark/executor/runTime
executor:shuffleFetchWaitTime spark/executor/shuffleFetchWaitTime
executor:shuffleRecordsRead spark/executor/shuffleRecordsRead
executor:shuffleRecordsWritten spark/executor/shuffleRecordsWritten
executor:shuffleRemoteBytesReadToDisk spark/executor/shuffleRemoteBytesReadToDisk
executor:shuffleWriteTime spark/executor/shuffleWriteTime
executor:succeededTasks spark/executor/succeededTasks
ExecutorMetrics:MajorGCTime spark/executor/ExecutorMetrics/MajorGCTime
ExecutorMetrics:MinorGCTime spark/executor/ExecutorMetrics/MinorGCTime

系统指标

指标 指标浏览器名称
agent:uptime agent/uptime
cpu:utilization cpu/utilization
disk:bytes_used disk/bytes_used
disk:percent_used disk/percent_used
memory:bytes_used memory/bytes_used
memory:percent_used memory/percent_used
network:tcp_connections network/tcp_connections

查看 Spark 指标

如需查看批量指标,请点击控制台中的 Dataproc 批量 页面上的批量 ID,以打开批量详细信息 页面, 该页面会在监控标签页 下显示批量工作负载的指标图表。Google Cloud

图 1.批量工作负载的 Spark 指标图表。

如需详细了解如何查看收集的指标,请参阅 Dataproc Cloud Monitoring