Managed Service for Apache Spark 指標

Cloud Monitoring 可讓您掌握雲端應用程式的效能、運作時間和整體健康狀態。Google Cloud Observability 會收集及擷取 Managed Service for Apache Spark 叢集的指標、事件和中繼資料,包括每個叢集的 HDFS、YARN、工作和作業指標,並透過資訊主頁和圖表產生洞察資料 (請參閱「Cloud Monitoring Managed Service for Apache Spark 指標」)。

收集 Managed Service for Apache Spark 資源指標

Cloud Monitoring 會收集下列 Managed Service for Apache Spark 資源的相關指標:

  • Cloud Dataproc 叢集
  • Cloud Dataproc 工作
  • Cloud Dataproc 批次
  • Cloud Dataproc 工作階段

Managed Service for Apache Spark 資源指標會以以下格式收集: dataproc.googleapis.com/RESOURCE/METRIC, 並包含多項 OSS 指標的集合。

查看 Managed Service for Apache Spark 資源指標

您可以在 Metrics Explorer 中選取並查看 Managed Service for Apache Spark 資源指標,方法是在 Filter by resource or metric name 方塊中輸入「dataproc」,然後選取「Cloud Dataproc」資源。

收集自訂指標

建立 Managed Service for Apache Spark 叢集時,您可以啟用從一或多個自訂指標來源收集指標。系統會從每個已啟用的指標來源收集一組標準指標,除非您指定要從指標來源收集的指標 (使用者指定的指標稱為指標「覆寫」)。

自訂 OSS 指標會以以下格式收集: custom.googleapis.com/OSS_COMPONENT/METRIC

自訂 OSS 指標範例:

custom.googleapis.com/spark/driver/DAGScheduler/job/allJobs
custom.googleapis.com/hiveserver2/memory/MaxNonHeapMemory

啟用自訂指標收集功能

您可以使用 gcloud CLI 或 Dataproc API,從一或多個指標來源啟用自訂指標的收集作業。

gcloud CLI

自訂指標收集

使用 gcloud dataproc clusters create --metric-sources 旗標,從一或多個指標來源啟用自訂指標的收集作業。

gcloud dataproc clusters create cluster-name \
    --metric-sources=METRIC_SOURCE(s) \
    ... other flags

注意:

覆寫指標收集作業

視需要新增 --metric-overrides--metric-overrides-file 標記,從一或多個指標來源收集一或多個自訂指標

  • 任何自訂指標和所有 Spark 指標都可以列為指標覆寫,供系統收集。覆寫指標值區分大小寫,且必須以 CamelCase 格式提供 (如適用)。

    範例:

    • sparkHistoryServer:JVM:Memory:NonHeapMemoryUsage.committed
    • hiveserver2:JVM:Memory:NonHeapMemoryUsage.used
    • yarn:ResourceManager:JvmMetrics:MemHeapMaxM

  • 系統只會從指定的指標來源收集覆寫的指標。舉例來說,如果一或多個spark:executive指標列為指標覆寫,系統就不會收集其他SPARK指標。從其他指標來源收集自訂指標的作業不會受到影響。舉例來說,如果同時啟用 SPARKYARN 指標來源,但只為 Spark 指標提供覆寫,系統就會收集已啟用的標準 YARN 指標集。
  • 必須啟用指定指標覆寫的來源。舉例來說,如果一或多個 spark:driver 指標做為指標覆寫提供,則必須啟用 spark 指標來源 (--metric-sources=spark)。

覆寫指標清單

gcloud dataproc clusters create cluster-name \
    --metric-sources=METRIC_SOURCE(s) \
    --metric-overrides=LIST_OF_METRIC_OVERRIDES \
    ... other flags

注意:

  • --metric-sources:啟用自訂指標收集功能時,這是必要屬性。 指定一或多個下列指標來源: sparkflinkhdfsyarnspark-history-serverhiveserver2hivemetastoremonitoring-agent-defaults。 指標來源名稱不區分大小寫,例如「yarn」或「YARN」都可以。
  • --metric-overrides:請提供指標清單,格式如下:

    METRIC_SOURCE:INSTANCE:GROUP:METRIC

    範例:--metric-overrides=sparkHistoryServer:JVM:Memory:NonHeapMemoryUsage.committed

    這個旗標是 --metric-overrides-file 旗標的替代方案,無法與 --metric-overrides-file 旗標搭配使用。

覆寫指標檔案

gcloud dataproc clusters create cluster-name \
    --metric-sources=METRIC-SOURCE(s) \
    --metric-overrides-file=METRIC_OVERRIDES_FILENAME \
    ... other flags

注意:

  • --metric-sources:啟用自訂指標收集功能時,這是必要屬性。 指定一或多個下列指標來源: sparkflinkhdfsyarnspark-history-serverhiveserver2hivemetastoremonitoring-agent-defaults。 指標來源名稱不區分大小寫,例如「yarn」或「YARN」都可以。
  • --metric-overrides-file:指定本機或 Cloud Storage 檔案 (gs://bucket/filename),其中包含一或多個指標,格式如下:

    METRIC_SOURCE:INSTANCE:GROUP:METRIC

    視情況使用駝峰式大小寫格式。

    範例:

    • --metric-overrides-file=gs://my-bucket/my-filename.txt
    • --metric-overrides-file=./local-directory/local-filename.txt

      這個標記是 --metric-overrides 標記的替代方案,無法與 --metric-overrides 標記搭配使用。

REST API

clusters.create 要求中納入 DataprocMetricConfig,即可啟用自訂指標收集功能。注意:除非安裝 Ops Agent,否則 2.2 版映像檔叢集不支援 monitoring-agent-defaults

查看自訂指標

您可以在「指標探索器」中選取並查看 Managed Service for Apache Spark 資源指標,方法是選取 VM Instance 資源,然後選取 Custom metrics

自訂指標

您可以啟用 Managed Service for Apache Spark,收集下表列出的自訂指標。

  • 如果 Managed Service for Apache Spark 在您啟用相關聯的指標來源時收集指標,「已啟用指標」欄會標示為「y」。

  • 如要收集指標來源列出的任何指標,以及所有 Spark 指標,請覆寫指標來源的標準啟用指標集 (請參閱「啟用自訂指標收集功能」)。

  • Managed Service for Apache Spark 會使用監控代理程式收集指標。啟用任何指標來源,即可收集代理程式指標。 這些指標不會向使用者收費,Managed Service for Apache Spark 會使用這些指標診斷指標收集問題。

Hadoop 指標

HDFS 指標

指標 Metrics Explorer 名稱 已啟用指標
hdfs:NameNode:FSNamesystem:CapacityTotalGB dfs/FSNamesystem/CapacityTotalGB y
hdfs:NameNode:FSNamesystem:CapacityUsedGB dfs/FSNamesystem/CapacityUsedGB y
hdfs:NameNode:FSNamesystem:CapacityRemainingGB dfs/FSNamesystem/CapacityRemainingGB y
hdfs:NameNode:FSNamesystem:FilesTotal dfs/FSNamesystem/FilesTotal y
hdfs:NameNode:FSNamesystem:MissingBlocks dfs/FSNamesystem/MissingBlocks n
hdfs:NameNode:FSNamesystem:ExpiredHeartbeats dfs/FSNamesystem/ExpiredHeartbeats n
hdfs:NameNode:FSNamesystem:TransactionsSinceLastCheckpoint dfs/FSNamesystem/TransactionsSinceLastCheckpoint n
hdfs:NameNode:FSNamesystem:TransactionsSinceLastLogRoll dfs/FSNamesystem/TransactionsSinceLastLogRoll n
hdfs:NameNode:FSNamesystem:LastWrittenTransactionId dfs/FSNamesystem/LastWrittenTransactionId n
hdfs:NameNode:FSNamesystem:CapacityTotal dfs/FSNamesystem/CapacityTotal n
hdfs:NameNode:FSNamesystem:CapacityUsed dfs/FSNamesystem/CapacityUsed n
hdfs:NameNode:FSNamesystem:CapacityRemaining dfs/FSNamesystem/CapacityRemaining n
hdfs:NameNode:FSNamesystem:CapacityUsedNonDFS dfs/FSNamesystem/CapacityUsedNonDFS n
hdfs:NameNode:FSNamesystem:TotalLoad dfs/FSNamesystem/TotalLoad n
hdfs:NameNode:FSNamesystem:SnapshottableDirectories dfs/FSNamesystem/SnapshottableDirectories n
hdfs:NameNode:FSNamesystem:Snapshots dfs/FSNamesystem/Snapshots n
hdfs:NameNode:FSNamesystem:BlocksTotal dfs/FSNamesystem/BlocksTotal n
hdfs:NameNode:FSNamesystem:PendingReplicationBlocks dfs/FSNamesystem/PendingReplicationBlocks n
hdfs:NameNode:FSNamesystem:UnderReplicatedBlocks dfs/FSNamesystem/UnderReplicatedBlocks n
hdfs:NameNode:FSNamesystem:CorruptBlocks dfs/FSNamesystem/CorruptBlocks n
hdfs:NameNode:FSNamesystem:ScheduledReplicationBlocks dfs/FSNamesystem/ScheduledReplicationBlocks n
hdfs:NameNode:FSNamesystem:PendingDeletionBlocks dfs/FSNamesystem/PendingDeletionBlocks n
hdfs:NameNode:FSNamesystem:ExcessBlocks dfs/FSNamesystem/ExcessBlocks n
hdfs:NameNode:FSNamesystem:PostponedMisreplicatedBlocks dfs/FSNamesystem/PostponedMisreplicatedBlocks n
hdfs:NameNode:FSNamesystem:PendingDataNodeMessageCourt dfs/FSNamesystem/PendingDataNodeMessageCourt n
hdfs:NameNode:FSNamesystem:MillisSinceLastLoadedEdits dfs/FSNamesystem/MillisSinceLastLoadedEdits n
hdfs:NameNode:FSNamesystem:BlockCapacity dfs/FSNamesystem/BlockCapacity n
hdfs:NameNode:FSNamesystem:StaleDataNodes dfs/FSNamesystem/StaleDataNodes n
hdfs:NameNode:FSNamesystem:TotalFiles dfs/FSNamesystem/TotalFiles n
hdfs:NameNode:JvmMetrics:MemHeapUsedM dfs/jvm/MemHeapUsedM n
hdfs:NameNode:JvmMetrics:MemHeapCommittedM dfs/jvm/MemHeapCommittedM n
hdfs:NameNode:JvmMetrics:MemHeapMaxM dfs/jvm/MemHeapMaxM n
hdfs:NameNode:JvmMetrics:MemMaxM dfs/jvm/MemMaxM n

YARN 指標

指標 Metrics Explorer 名稱 已啟用指標
yarn:ResourceManager:ClusterMetrics:NumActiveNMs yarn/ClusterMetrics/NumActiveNMs y
yarn:ResourceManager:ClusterMetrics:NumDecommissionedNMs yarn/ClusterMetrics/NumDecommissionedNMs n
yarn:ResourceManager:ClusterMetrics:NumLostNMs yarn/ClusterMetrics/NumLostNMs n
yarn:ResourceManager:ClusterMetrics:NumUnhealthyNMs yarn/ClusterMetrics/NumUnhealthyNMs n
yarn:ResourceManager:ClusterMetrics:NumRebootedNMs yarn/ClusterMetrics/NumRebootedNMs n
yarn:ResourceManager:QueueMetrics:running_0 yarn/QueueMetrics/running_0 y
yarn:ResourceManager:QueueMetrics:running_60 yarn/QueueMetrics/running_60 y
yarn:ResourceManager:QueueMetrics:running_300 yarn/QueueMetrics/running_300 y
yarn:ResourceManager:QueueMetrics:running_1440 yarn/QueueMetrics/running_1440 y
yarn:ResourceManager:QueueMetrics:AppsSubmitted yarn/QueueMetrics/AppsSubmitted y
yarn:ResourceManager:QueueMetrics:AvailableMB yarn/QueueMetrics/AvailableMB y
yarn:ResourceManager:QueueMetrics:PendingContainers yarn/QueueMetrics/PendingContainers y
yarn:ResourceManager:QueueMetrics:AppsRunning yarn/QueueMetrics/AppsRunning n
yarn:ResourceManager:QueueMetrics:AppsPending yarn/QueueMetrics/AppsPending n
yarn:ResourceManager:QueueMetrics:AppsCompleted yarn/QueueMetrics/AppsCompleted n
yarn:ResourceManager:QueueMetrics:AppsKilled yarn/QueueMetrics/AppsKilled n
yarn:ResourceManager:QueueMetrics:AppsFailed yarn/QueueMetrics/AppsFailed n
yarn:ResourceManager:QueueMetrics:AllocatedMB yarn/QueueMetrics/AllocatedMB n
yarn:ResourceManager:QueueMetrics:AllocatedVCores yarn/QueueMetrics/AllocatedVCores n
yarn:ResourceManager:QueueMetrics:AllocatedContainers yarn/QueueMetrics/AllocatedContainers n
yarn:ResourceManager:QueueMetrics:AggregateContainersAllocated yarn/QueueMetrics/AggregateContainersAllocated n
yarn:ResourceManager:QueueMetrics:AggregateContainersReleased yarn/QueueMetrics/AggregateContainersReleased n
yarn:ResourceManager:QueueMetrics:AvailableVCores yarn/QueueMetrics/AvailableVCores n
yarn:ResourceManager:QueueMetrics:PendingMB yarn/QueueMetrics/PendingMB n
yarn:ResourceManager:QueueMetrics:PendingVCores yarn/QueueMetrics/PendingVCores n
yarn:ResourceManager:QueueMetrics:ReservedMB yarn/QueueMetrics/ReservedMB n
yarn:ResourceManager:QueueMetrics:ReservedVCores yarn/QueueMetrics/ReservedVCores n
yarn:ResourceManager:QueueMetrics:ReservedContainers yarn/QueueMetrics/ReservedContainers n
yarn:ResourceManager:QueueMetrics:ActiveUsers yarn/QueueMetrics/ActiveUsers n
yarn:ResourceManager:QueueMetrics:ActiveApplications yarn/QueueMetrics/ActiveApplications n
yarn:ResourceManager:QueueMetrics:FairShareMB yarn/QueueMetrics/FairShareMB n
yarn:ResourceManager:QueueMetrics:FairShareVCores yarn/QueueMetrics/FairShareVCores n
yarn:ResourceManager:QueueMetrics:MinShareMB yarn/QueueMetrics/MinShareMB n
yarn:ResourceManager:QueueMetrics:MinShareVCores yarn/QueueMetrics/MinShareVCores n
yarn:ResourceManager:QueueMetrics:MaxShareMB yarn/QueueMetrics/MaxShareMB n
yarn:ResourceManager:QueueMetrics:MaxShareVCores yarn/QueueMetrics/MaxShareVCores n
yarn:ResourceManager:JvmMetrics:MemHeapUsedM yarn/jvm/MemHeapUsedM n
yarn:ResourceManager:JvmMetrics:MemHeapCommittedM yarn/jvm/MemHeapCommittedM n
yarn:ResourceManager:JvmMetrics:MemHeapMaxM yarn/jvm/MemHeapMaxM n
yarn:ResourceManager:JvmMetrics:MemMaxM yarn/jvm/MemMaxM n

Spark 指標

Spark 驅動程式指標

指標 Metrics Explorer 名稱 已啟用指標
spark:driver:BlockManager:disk.diskSpaceUsed_MB spark/driver/BlockManager/disk/diskSpaceUsed_MB y
spark:driver:BlockManager:memory.maxMem_MB spark/driver/BlockManager/memory/maxMem_MB y
spark:driver:BlockManager:memory.memUsed_MB spark/driver/BlockManager/memory/memUsed_MB y
spark:driver:DAGScheduler:job.allJobs spark/driver/DAGScheduler/job/allJobs y
spark:driver:DAGScheduler:stage.failedStages spark/driver/DAGScheduler/stage/failedStages y
spark:driver:DAGScheduler:stage.waitingStages spark/driver/DAGScheduler/stage/waitingStages y

Spark 執行器指標

指標 Metrics Explorer 名稱 已啟用指標
spark:executor:executor:bytesRead spark/executor/bytesRead y
spark:executor:executor:bytesWritten spark/executor/bytesWritten y
spark:executor:executor:cpuTime spark/executor/cpuTime y
spark:executor:executor:diskBytesSpilled spark/executor/diskBytesSpilled y
spark:executor:executor:recordsRead spark/executor/recordsRead y
spark:executor:executor:recordsWritten spark/executor/recordsWritten y
spark:executor:executor:runTime spark/executor/runTime y
spark:executor:executor:shuffleRecordsRead spark/executor/shuffleRecordsRead y
spark:executor:executor:shuffleRecordsWritten spark/executor/shuffleRecordsWritten y
指標 Metrics Explorer 名稱 已啟用指標
flink:jobmanager:numRegisteredTaskManagers flink/jobmanager/numRegisteredTaskManagers n
flink:jobmanager:numRunningJobs flink/jobmanager/numRunningJobs n
flink:jobmanager:Status.JVM.ClassLoader.ClassesLoaded flink/jobmanager/Status.JVM.ClassLoader.ClassesLoaded n
flink:jobmanager:Status.JVM.ClassLoader.ClassesUnloaded flink/jobmanager/Status.JVM.ClassLoader.ClassesUnloaded n
flink:jobmanager:Status.JVM.CPU.Load flink/jobmanager/Status.JVM.CPU.Load n
flink:jobmanager:Status.JVM.CPU.Time flink/jobmanager/Status.JVM.CPU.Time y
flink:jobmanager:Status.JVM.GarbageCollector.PSMarkSweep.Count flink/jobmanager/Status.JVM.GarbageCollector.PSMarkSweep.Count n
flink:jobmanager:Status.JVM.GarbageCollector.PSMarkSweep.Time flink/jobmanager/Status.JVM.GarbageCollector.PSMarkSweep.Time n
flink:jobmanager:Status.JVM.GarbageCollector.PSScavenge.Count flink/jobmanager/Status.JVM.GarbageCollector.PSScavenge.Count n
flink:jobmanager:Status.JVM.GarbageCollector.PSScavenge.Time flink/jobmanager/Status.JVM.GarbageCollector.PSScavenge.Time n
flink:jobmanager:Status.JVM.Memory.Direct.Count flink/jobmanager/Status.JVM.Memory.Direct.Count y
flink:jobmanager:Status.JVM.Memory.Direct.MemoryUsed flink/jobmanager/Status.JVM.Memory.Direct.MemoryUsed y
flink:jobmanager:Status.JVM.Memory.Direct.TotalCapacity flink/jobmanager/Status.JVM.Memory.Direct.TotalCapacity y
flink:jobmanager:Status.JVM.Memory.Heap.Committed flink/jobmanager/Status.JVM.Memory.Heap.Committed y
flink:jobmanager:Status.JVM.Memory.Heap.Max flink/jobmanager/Status.JVM.Memory.Heap.Max y
flink:jobmanager:Status.JVM.Memory.Heap.Used flink/jobmanager/Status.JVM.Memory.Heap.Used y
flink:jobmanager:Status.JVM.Memory.Mapped.Count flink/jobmanager/Status.JVM.Memory.Mapped.Count y
flink:jobmanager:Status.JVM.Memory.Mapped.MemoryUsed flink/jobmanager/Status.JVM.Memory.Mapped.MemoryUsed y
flink:jobmanager:Status.JVM.Memory.Mapped.TotalCapacity flink/jobmanager/Status.JVM.Memory.Mapped.TotalCapacity y
flink:jobmanager:Status.JVM.Memory.Metaspace.Committed flink/jobmanager/Status.JVM.Memory.Metaspace.Committed n
flink:jobmanager:Status.JVM.Memory.Metaspace.Max flink/jobmanager/Status.JVM.Memory.Metaspace.Max n
flink:jobmanager:Status.JVM.Memory.Metaspace.Used flink/jobmanager/Status.JVM.Memory.Metaspace.Used n
flink:jobmanager:Status.JVM.Memory.NonHeap.Committed flink/jobmanager/Status.JVM.Memory.NonHeap.Committed n
flink:jobmanager:Status.JVM.Memory.NonHeap.Max flink/jobmanager/Status.JVM.Memory.NonHeap.Max n
flink:jobmanager:Status.JVM.Memory.NonHeap.Used flink/jobmanager/Status.JVM.Memory.NonHeap.Used n
flink:jobmanager:Status.JVM.Threads.Count flink/jobmanager/Status.JVM.Threads.Count n
flink:jobmanager:taskSlotsAvailable flink/jobmanager/taskSlotsAvailable y
flink:jobmanager:taskSlotsTotal flink/jobmanager/taskSlotsTotal y
flink:operator:numRecordsIn flink/operator/numRecordsIn n
flink:operator:numRecordsInPerSecond.count flink/operator/numRecordsInPerSecond.count n
flink:operator:numRecordsInPerSecond.rate flink/operator/numRecordsInPerSecond.rate n
flink:operator:numRecordsOut flink/operator/numRecordsOut n
flink:operator:numRecordsOutPerSecond.count flink/operator/numRecordsOutPerSecond.count n
flink:operator:numRecordsOutPerSecond.rate flink/operator/numRecordsOutPerSecond.rate n
flink:operator:numSplitsProcessed flink/operator/numSplitsProcessed n
flink:task:buffers.inPoolUsage flink/task/buffers.inPoolUsage n
flink:task:buffers.inputExclusiveBuffersUsage flink/task/buffers.inputExclusiveBuffersUsage n
flink:task:buffers.inputFloatingBuffersUsage flink/task/buffers.inputFloatingBuffersUsage n
flink:task:buffers.inputQueueLength flink/task/buffers.inputQueueLength n
flink:task:buffers.outPoolUsage flink/task/buffers.outPoolUsage n
flink:task:buffers.outputQueueLength flink/task/buffers.outputQueueLength n
flink:task:idleTimeMsPerSecond.count flink/task/idleTimeMsPerSecond.count n
flink:task:idleTimeMsPerSecond.rate flink/task/idleTimeMsPerSecond.rate n
flink:task:numBuffersInLocal flink/task/numBuffersInLocal n
flink:task:numBuffersInLocalPerSecond.count flink/task/numBuffersInLocalPerSecond.count n
flink:task:numBuffersInLocalPerSecond.rate flink/task/numBuffersInLocalPerSecond.rate n
flink:task:numBuffersInRemote flink/task/numBuffersInRemote n
flink:task:numBuffersInRemotePerSecond.count flink/task/numBuffersInRemotePerSecond.count n
flink:task:numBuffersInRemotePerSecond.rate flink/task/numBuffersInRemotePerSecond.rate n
flink:task:numBuffersOut flink/task/numBuffersOut n
flink:task:numBuffersOutPerSecond.count flink/task/numBuffersOutPerSecond.count n
flink:task:numBuffersOutPerSecond.rate flink/task/numBuffersOutPerSecond.rate n
flink:task:numBytesIn flink/task/numBytesIn n
flink:task:numBytesInLocal flink/task/numBytesInLocal n
flink:task:numBytesInLocalPerSecond.count flink/task/numBytesInLocalPerSecond.count n
flink:task:numBytesInLocalPerSecond.rate flink/task/numBytesInLocalPerSecond.rate n
flink:task:numBytesInPerSecond.count flink/task/numBytesInPerSecond.count n
flink:task:numBytesInPerSecond.rate flink/task/numBytesInPerSecond.rate n
flink:task:numBytesInRemote flink/task/numBytesInRemote n
flink:task:numBytesInRemotePerSecond.count flink/task/numBytesInRemotePerSecond.count n
flink:task:numBytesInRemotePerSecond.rate flink/task/numBytesInRemotePerSecond.rate n
flink:task:numBytesOut flink/task/numBytesOut n
flink:task:numBytesOutPerSecond.count flink/task/numBytesOutPerSecond.count n
flink:task:numBytesOutPerSecond.rate flink/task/numBytesOutPerSecond.rate n
flink:task:numRecordsIn flink/task/numRecordsIn n
flink:task:numRecordsInPerSecond.count flink/task/numRecordsInPerSecond.count n
flink:task:numRecordsInPerSecond.rate flink/task/numRecordsInPerSecond.rate n
flink:task:numRecordsOut flink/task/numRecordsOut n
flink:task:numRecordsOutPerSecond.count flink/task/numRecordsOutPerSecond.count n
flink:task:numRecordsOutPerSecond.rate flink/task/numRecordsOutPerSecond.rate n
flink:task:Shuffle.Netty.Input.Buffers.inPoolUsage flink/task/Shuffle.Netty.Input.Buffers.inPoolUsage n
flink:task:Shuffle.Netty.Input.Buffers.inputExclusiveBuffersUsage flink/task/Shuffle.Netty.Input.Buffers.inputExclusiveBuffersUsage n
flink:task:Shuffle.Netty.Input.Buffers.inputFloatingBuffersUsage flink/task/Shuffle.Netty.Input.Buffers.inputFloatingBuffersUsage n
flink:task:Shuffle.Netty.Input.Buffers.inputQueueLength flink/task/Shuffle.Netty.Input.Buffers.inputQueueLength n
flink:task:Shuffle.Netty.Input.numBuffersInLocal flink/task/Shuffle.Netty.Input.numBuffersInLocal n
flink:task:Shuffle.Netty.Input.numBuffersInLocalPerSecond.count flink/task/Shuffle.Netty.Input.numBuffersInLocalPerSecond.count n
flink:task:Shuffle.Netty.Input.numBuffersInLocalPerSecond.rate flink/task/Shuffle.Netty.Input.numBuffersInLocalPerSecond.rate n
flink:task:Shuffle.Netty.Input.numBuffersInRemote flink/task/Shuffle.Netty.Input.numBuffersInRemote n
flink:task:Shuffle.Netty.Input.numBuffersInRemotePerSecond.count flink/task/Shuffle.Netty.Input.numBuffersInRemotePerSecond.count n
flink:task:Shuffle.Netty.Input.numBuffersInRemotePerSecond.rate flink/task/Shuffle.Netty.Input.numBuffersInRemotePerSecond.rate n
flink:task:Shuffle.Netty.Input.numBytesInLocal flink/task/Shuffle.Netty.Input.numBytesInLocal n
flink:task:Shuffle.Netty.Input.numBytesInLocalPerSecond.count flink/task/Shuffle.Netty.Input.numBytesInLocalPerSecond.count n
flink:task:Shuffle.Netty.Input.numBytesInLocalPerSecond.rate flink/task/Shuffle.Netty.Input.numBytesInLocalPerSecond.rate n
flink:task:Shuffle.Netty.Input.numBytesInRemote flink/task/Shuffle.Netty.Input.numBytesInRemote n
flink:task:Shuffle.Netty.Input.numBytesInRemotePerSecond.count flink/task/Shuffle.Netty.Input.numBytesInRemotePerSecond.count n
flink:task:Shuffle.Netty.Input.numBytesInRemotePerSecond.rate flink/task/Shuffle.Netty.Input.numBytesInRemotePerSecond.rate n
flink:task:Shuffle.Netty.Output.Buffers.outPoolUsage flink/task/Shuffle.Netty.Output.Buffers.outPoolUsage n
flink:task:Shuffle.Netty.Output.Buffers.outputQueueLength flink/task/Shuffle.Netty.Output.Buffers.outputQueueLength n
flink:taskmanager:Status.flink.Memory.Managed.Total flink/taskmanager/Status.flink.Memory.Managed.Total n
flink:taskmanager:Status.flink.Memory.Managed.Used flink/taskmanager/Status.flink.Memory.Managed.Used n
flink:taskmanager:Status.JVM.ClassLoader.ClassesLoaded flink/taskmanager/Status.JVM.ClassLoader.ClassesLoaded n
flink:taskmanager:Status.JVM.ClassLoader.ClassesUnloaded flink/taskmanager/Status.JVM.ClassLoader.ClassesUnloaded n
flink:taskmanager:Status.JVM.CPU.Load flink/taskmanager/Status.JVM.CPU.Load n
flink:taskmanager:Status.JVM.CPU.Time flink/taskmanager/Status.JVM.CPU.Time y
flink:taskmanager:Status.JVM.GarbageCollector.PSMarkSweep.Count flink/taskmanager/Status.JVM.GarbageCollector.PSMarkSweep.Count n
flink:taskmanager:Status.JVM.GarbageCollector.PSMarkSweep.Time flink/taskmanager/Status.JVM.GarbageCollector.PSMarkSweep.Time n
flink:taskmanager:Status.JVM.GarbageCollector.PSScavenge.Count flink/taskmanager/Status.JVM.GarbageCollector.PSScavenge.Count n
flink:taskmanager:Status.JVM.GarbageCollector.PSScavenge.Time flink/taskmanager/Status.JVM.GarbageCollector.PSScavenge.Time n
flink:taskmanager:Status.JVM.Memory.Direct.Count flink/taskmanager/Status.JVM.Memory.Direct.Count y
flink:taskmanager:Status.JVM.Memory.Direct.MemoryUsed flink/taskmanager/Status.JVM.Memory.Direct.MemoryUsed y
flink:taskmanager:Status.JVM.Memory.Direct.TotalCapacity flink/taskmanager/Status.JVM.Memory.Direct.TotalCapacity y
flink:taskmanager:Status.JVM.Memory.Heap.Committed flink/taskmanager/Status.JVM.Memory.Heap.Committed y
flink:taskmanager:Status.JVM.Memory.Heap.Max flink/taskmanager/Status.JVM.Memory.Heap.Max y
flink:taskmanager:Status.JVM.Memory.Heap.Used flink/taskmanager/Status.JVM.Memory.Heap.Used y
flink:taskmanager:Status.JVM.Memory.Mapped.Count flink/taskmanager/Status.JVM.Memory.Mapped.Count y
flink:taskmanager:Status.JVM.Memory.Mapped.MemoryUsed flink/taskmanager/Status.JVM.Memory.Mapped.MemoryUsed y
flink:taskmanager:Status.JVM.Memory.Mapped.TotalCapacity flink/taskmanager/Status.JVM.Memory.Mapped.TotalCapacity y
flink:taskmanager:Status.JVM.Memory.Metaspace.Committed flink/taskmanager/Status.JVM.Memory.Metaspace.Committed n
flink:taskmanager:Status.JVM.Memory.Metaspace.Max flink/taskmanager/Status.JVM.Memory.Metaspace.Max n
flink:taskmanager:Status.JVM.Memory.Metaspace.Used flink/taskmanager/Status.JVM.Memory.Metaspace.Used n
flink:taskmanager:Status.JVM.Memory.NonHeap.Committed flink/taskmanager/Status.JVM.Memory.NonHeap.Committed n
flink:taskmanager:Status.JVM.Memory.NonHeap.Max flink/taskmanager/Status.JVM.Memory.NonHeap.Max n
flink:taskmanager:Status.JVM.Memory.NonHeap.Used flink/taskmanager/Status.JVM.Memory.NonHeap.Used n
flink:taskmanager:Status.JVM.Threads.Count flink/taskmanager/Status.JVM.Threads.Count n
flink:taskmanager:Status.Network.AvailableMemorySegments flink/taskmanager/Status.Network.AvailableMemorySegments n
flink:taskmanager:Status.Network.TotalMemorySegments flink/taskmanager/Status.Network.TotalMemorySegments n
flink:taskmanager:Status.Shuffle.Netty.AvailableMemory flink/taskmanager/Status.Shuffle.Netty.AvailableMemory n
flink:taskmanager:Status.Shuffle.Netty.AvailableMemorySegments flink/taskmanager/Status.Shuffle.Netty.AvailableMemorySegments n
flink:taskmanager:Status.Shuffle.Netty.TotalMemory flink/taskmanager/Status.Shuffle.Netty.TotalMemory n
flink:taskmanager:Status.Shuffle.Netty.TotalMemorySegments flink/taskmanager/Status.Shuffle.Netty.TotalMemorySegments n
flink:taskmanager:Status.Shuffle.Netty.UsedMemory flink/taskmanager/Status.Shuffle.Netty.UsedMemory n
flink:taskmanager:Status.Shuffle.Netty.UsedMemorySegments flink/taskmanager/Status.Shuffle.Netty.UsedMemorySegments n

Spark 記錄伺服器指標

Managed Service for Apache Spark 會收集下列 Spark 歷來服務 JVM 記憶體指標:

指標 Metrics Explorer 名稱 已啟用指標
sparkHistoryServer:JVM:Memory:HeapMemoryUsage.committed sparkHistoryServer/memory/CommittedHeapMemory y
sparkHistoryServer:JVM:Memory:HeapMemoryUsage.used sparkHistoryServer/memory/UsedHeapMemory y
sparkHistoryServer:JVM:Memory:HeapMemoryUsage.max sparkHistoryServer/memory/MaxHeapMemory y
sparkHistoryServer:JVM:Memory:NonHeapMemoryUsage.committed sparkHistoryServer/memory/CommittedNonHeapMemory y
sparkHistoryServer:JVM:Memory:NonHeapMemoryUsage.used sparkHistoryServer/memory/UsedNonHeapMemory y
sparkHistoryServer:JVM:Memory:NonHeapMemoryUsage.max sparkHistoryServer/memory/MaxNonHeapMemory y

HiveServer 2 指標

指標 Metrics Explorer 名稱 已啟用指標
hiveserver2:JVM:Memory:HeapMemoryUsage.committed hiveserver2/memory/CommittedHeapMemory y
hiveserver2:JVM:Memory:HeapMemoryUsage.used hiveserver2/memory/UsedHeapMemory y
hiveserver2:JVM:Memory:HeapMemoryUsage.max hiveserver2/memory/MaxHeapMemory y
hiveserver2:JVM:Memory:NonHeapMemoryUsage.committed hiveserver2/memory/CommittedNonHeapMemory y
hiveserver2:JVM:Memory:NonHeapMemoryUsage.used hiveserver2/memory/UsedNonHeapMemory y
hiveserver2:JVM:Memory:NonHeapMemoryUsage.max hiveserver2/memory/MaxNonHeapMemory y

Hive Metastore 指標

指標 Metrics Explorer 名稱 已啟用指標
hivemetastore:API:GetDatabase:Mean hivemetastore/get_database/mean y
hivemetastore:API:CreateDatabase:Mean hivemetastore/create_database/mean y
hivemetastore:API:DropDatabase:Mean hivemetastore/drop_database/mean y
hivemetastore:API:AlterDatabase:Mean hivemetastore/alter_database/mean y
hivemetastore:API:GetAllDatabases:Mean hivemetastore/get_all_databases/mean y
hivemetastore:API:CreateTable:Mean hivemetastore/create_table/mean y
hivemetastore:API:DropTable:Mean hivemetastore/drop_table/mean y
hivemetastore:API:AlterTable:Mean hivemetastore/alter_table/mean y
hivemetastore:API:GetTable:Mean hivemetastore/get_table/mean y
hivemetastore:API:GetAllTables:Mean hivemetastore/get_all_tables/mean y
hivemetastore:API:AddPartitionsReq:Mean hivemetastore/add_partitions_req/mean y
hivemetastore:API:DropPartition:Mean hivemetastore/drop_partition/mean y
hivemetastore:API:AlterPartition:Mean hivemetastore/alter_partition/mean y
hivemetastore:API:GetPartition:Mean hivemetastore/get_partition/mean y
hivemetastore:API:GetPartitionNames:Mean hivemetastore/get_partition_names/mean y
hivemetastore:API:GetPartitionsPs:Mean hivemetastore/get_partitions_ps/mean y
hivemetastore:API:GetPartitionsPsWithAuth:Mean hivemetastore/get_partitions_ps_with_auth/mean y

Hive Metastore 指標評估

統計測量 範例指標 範例指標名稱
最大值 hivemetastore:API:GetDatabase:Max hivemetastore/get_database/max
最小值 hivemetastore:API:GetDatabase:Min hivemetastore/get_database/min
平均值 hivemetastore:API:GetDatabase:Mean hivemetastore/get_database/mean
數量 hivemetastore:API:GetDatabase:Count hivemetastore/get_database/count
第 50 個百分位數 hivemetastore:API:GetDatabase:50thPercentile hivemetastore/get_database/median
第 75 個百分位數 hivemetastore:API:GetDatabase:75thPercentile hivemetastore/get_database/75th_percentile
第 95 個百分位數 hivemetastore:API:GetDatabase:95thPercentile hivemetastore/get_database/95th_percentile
98thPercentile hivemetastore:API:GetDatabase:98thPercentile hivemetastore/get_database/98th_percentile
第 99 個百分位數 hivemetastore:API:GetDatabase:99thPercentile hivemetastore/get_database/99th_percentile
999thPercentile hivemetastore:API:GetDatabase:999thPercentile hivemetastore/get_database/999th_percentile
標準差 hivemetastore:API:GetDatabase:StdDev hivemetastore/get_database/stddev
FifteenMinuteRate hivemetastore:API:GetDatabase:FifteenMinuteRate hivemetastore/get_database/15min_rate
FiveMinuteRate hivemetastore:API:GetDatabase:FiveMinuteRate hivemetastore/get_database/5min_rate
OneMinuteRate hivemetastore:API:GetDatabase:OneMinuteRate hivemetastore/get_database/1min_rate
MeanRate hivemetastore:API:GetDatabase:MeanRate hivemetastore/get_database/mean_rate

Managed Service for Apache Spark 監控代理程式指標

設定 --metric-sources=monitoring-agent-defaults 時,Managed Service for Apache Spark 會收集下列 Managed Service for Apache Spark 監控代理程式指標:這些指標會加上 agent.googleapis.com 前置字元。

CPU
agent.googleapis.com/cpu/load_15m
agent.googleapis.com/cpu/load_1m
agent.googleapis.com/cpu/load_5m
agent.googleapis.com/cpu/usage_time*
agent.googleapis.com/cpu/utilization*

磁碟
agent.googleapis.com/disk/bytes_used
agent.googleapis.com/disk/io_time
agent.googleapis.com/disk/merged_operations
agent.googleapis.com/disk/operation_count
agent.googleapis.com/disk/operation_time
agent.googleapis.com/disk/pending_operations
agent.googleapis.com/disk/percent_used
agent.googleapis.com/disk/read_bytes_count

交換
agent.googleapis.com/swap/bytes_used
agent.googleapis.com/swap/io
agent.googleapis.com/swap/percent_used

記憶體
agent.googleapis.com/memory/bytes_used
agent.googleapis.com/memory/percent_used

程序 - 部分屬性遵循獨特的配額政策
agent.googleapis.com/processes/count_by_state
agent.googleapis.com/processes/cpu_time
agent.googleapis.com/processes/disk/read_bytes_count
agent.googleapis.com/processes/disk/write_bytes_count
agent.googleapis.com/processes/fork_count
agent.googleapis.com/processes/rss_usage
agent.googleapis.com/processes/vm_usage

介面
agent.googleapis.com/interface/errors
agent.googleapis.com/interface/packets
agent.googleapis.com/interface/traffic

網路
agent.googleapis.com/network/tcp_connections

建構監控資訊主頁

您可以建構 Monitoring 資訊主頁,顯示所選 Managed Service for Apache Spark 指標的圖表。

  1. 在「Monitoring Dashboards Overview」頁面中,選取「+ CREATE DASHBOARD」。為資訊主頁命名,然後按一下右上選單中的「Add Chart」,開啟「Add Chart」視窗。選取「Cloud Managed Service for Apache Spark Cluster」做為資源類型。選取一或多個指標和圖表屬性,然後「Save」圖表。

  2. 您可以在資訊主頁中新增其他圖表。儲存資訊主頁後,標題會出現在 Monitoring 的「資訊主頁總覽」頁面中。 您可以在資訊主頁顯示頁面中查看、更新及刪除資訊主頁圖表。

後續步驟