查看所有容量模式預留項目的拓撲和健康狀態
您可以在佈建 TPU 節點前或後,使用 Google Cloud 控制台或 Google Cloud CLI,擷取「所有容量」模式容量的拓撲和健康狀態資訊。您也可以透過 Compute Engine Instance API,或從 TPU VM 的客體 OS 執行curl 指令,擷取 TPU VM 執行個體的實體位置。您可根據叢集、區塊、子區塊、主機和 VM 層級的拓撲和健康狀態資訊,為工作負載做出拓撲感知放置決策、指定要部署的區塊或子區塊,以及瞭解 TPU VM 執行個體之間的相對鄰近程度。
在 Google Cloud 控制台中查看容量拓撲
如要使用 Google Cloud 控制台查看預訂詳細資料,請按照下列步驟操作:
- 在 Google Cloud 控制台使用搜尋列搜尋「預留項目」,然後前往預留項目頁面。
- 選取「On-demand reservations」(隨需預留項目) 分頁,然後找出 TPU All Capacity 模式預留項目。帳戶團隊會告知您預訂名稱。
- 選取預訂記錄,即可查看詳細資料頁面。
如果是「所有容量」模式的預訂,作業模式會設為「所有容量」。系統會顯示區塊清單,以及使用率和健康狀態摘要。
從清單中選取區塊,即可查看區塊詳細資料頁面。區塊的拓撲會顯示在「叢集位置」區段中。這個部分會顯示叢集名稱、區塊的雜湊 ID 和子區塊的雜湊 ID。
叢集名稱在所有 Google 機構中皆不重複。換句話說,兩位不同的顧客可能會看到相同的叢集名稱。與叢集名稱不同,區塊或子區塊的雜湊 ID 在貴機構的專案中不得重複。Google Cloud
您可以選取子區塊,顯示子區塊詳細資料頁面,其中只會顯示具有有效 TPU VM 執行個體的實體主機。系統不會顯示未使用的實體主機。
使用 Google Cloud CLI 查看容量拓撲
您可以在預訂、區塊和子區塊上使用 Google Cloud CLI list 和 describe 指令,找出容量的拓撲和健康狀態資訊。
您可以使用本節中指令顯示的資訊,判斷預訂項目中實體容量的拓撲階層。
描述預訂項目
你可以使用 gcloud compute reservations describe 概覽預訂的容量。下列指令會顯示預訂摘要:
gcloud compute reservations describe RESERVATION_NAME \ --project=PROJECT_ID \ --zone=ZONE
更改下列內容:
- RESERVATION_NAME:預訂名稱。
- PROJECT_ID:專案 ID。
- ZONE:預留項目所在的可用區。
輸出結果會與下列內容相似:
advancedDeploymentControl: reservationOperationalMode: ALL_CAPACITY aggregateReservation: inUseResources: - accelerator: acceleratorCount: 48 acceleratorType: projects/example-project/zones/us-central1-c/acceleratorTypes/tpu7x reservedResources: - accelerator: acceleratorCount: 128 acceleratorType: projects/example-project/zones/us-central1-c/acceleratorTypes/tpu7x vmFamily: VM_FAMILY_CLOUD_TPU_POD_SLICE_TPU7X workloadType: UNSPECIFIED creationTimestamp: '2025-11-05T14:16:30.571-08:00' deleteAtTime: '2026-11-06T08:00:00Z' deploymentType: DENSE enableEmergentMaintenance: false id: '8873145979824927313' kind: compute#reservation linkedCommitments: - https://www.googleapis.com/compute/v1/projects/example-project/regions/us-central1/commitments/example-cud name: example-reservation protectionTier: STANDARD reservationSharingPolicy: serviceShareType: ALLOW_ALL resourceStatus: healthInfo: degradedBlockCount: 0 healthStatus: HEALTHY healthyBlockCount: 1 reservationBlockCount: 1 reservationMaintenance: schedulingType: schedulingType: GROUPED selfLink: https://www.googleapis.com/compute/v1/projects/example-project/zones/us-central1-c/reservations/example-reservation shareSettings: projectMap: '111111111111': projectId: '111111111111' shareType: SPECIFIC_PROJECTS specificReservationRequired: true status: READY zone: https://www.googleapis.com/compute/v1/projects/example-project/zones/us-central1-c
輸出內容中的下列值會說明預訂項目:
advancedDeploymentControl.reservationOperationalMode:預留項目的容量模式aggregateReservation.inUseResources.accelerator.acceleratorCount:使用的 TPU 晶片數量aggregateReservation.inUseResources.accelerator.acceleratorType:TPU 版本reservedResources.accelerator.acceleratorCount:預留項目中的 TPU 晶片數量deploymentType:部署類型 (TPU 一律為DENSE)reservationSharingPolicy.serviceShareType:服務共用類型resourceStatus.healthInfo.healthStatus:容量的整體健康狀態resourceStatus.healthInfo.healthyBlockCount:預訂中健康狀態良好的區塊數量resourceStatus.reservationBlockCount:預訂中的區塊數量
列出所有預留項目模塊
您可以使用 gcloud compute reservations blocks list 指令,顯示預訂中所有區塊的容量、拓撲和健康狀態資訊。
每個區塊、子區塊和主機物件都有雜湊 ID。子物件的實體拓撲欄位會顯示父項物件的 ID。您可以使用雜湊 ID 建構容量的拓撲階層檢視畫面。
gcloud compute reservations blocks list RESERVATION_NAME \ --project=PROJECT_ID \ --zone=ZONE
更改下列內容:
- RESERVATION_NAME:預訂名稱。
- PROJECT_ID:專案 ID。
- ZONE:預留項目所在的可用區。
這個指令會顯示下列輸出:
count: 32 creationTimestamp: '2025-11-05T15:00:15.223-08:00' healthInfo: degradedSubBlockCount: 0 healthStatus: HEALTHY healthySubBlockCount: 2 id: '2996501069483632657' inUseCount: 12 kind: compute#reservationBlock name: example-reservation-block-0001 physicalTopology: block: 9a0e671424e45fd480ca172ad7a4e25d cluster: example-cluster reservationMaintenance: schedulingType: GROUPED reservationSubBlockCount: 2 reservationSubBlockInUseCount: 1 selfLink: https://www.googleapis.com/compute/v1/projects/example-project/zones/us-central1-c/reservations/example-reservation/reservationBlocks/example-reservation-block-0001 selfLinkWithId: https://www.googleapis.com/compute/v1/projects/example-project/zones/us-central1-c/reservations/example-reservation/reservationBlocks/2996501069483632657 status: READY zone: https://www.googleapis.com/compute/v1/projects/example-project/zones/us-central1-c count: 128 creationTimestamp: '2025-08-19T18:23:32.825-07:00' healthInfo: degradedSubBlockCount: 0 healthStatus: HEALTHY healthySubBlockCount: 4 id: '9a0e671424e45fd480ca172ad7a4e25d' inUseCount: 64 kind: compute#reservationBlock name: example-reservation-block-0002 physicalTopology: block: 3feffcdeb6434d68bb818a836f75c1b8 cluster: example-cluster reservationMaintenance: schedulingType: GROUPED reservationSubBlockCount: 2 reservationSubBlockInUseCount: 1 selfLink: https://www.googleapis.com/compute/v1/projects/example-project/zones/us-central1-c/reservations/example-reservation/reservationBlocks/example-reservation-block-0001 selfLinkWithId: https://www.googleapis.com/compute/v1/projects/example-project/zones/us-central1-c/reservations/example-reservation/reservationBlocks/2996501069483632657 status: READY zone: https://www.googleapis.com/compute/v1/projects/example-project/zones/us-central1-c
輸出內容中的下列值說明預訂中的區塊:
count:實體主機數量healthInfo.healthStatus:區塊的整體健康狀態healthInfo.healthySubblockCount:區塊中健康狀態良好的子模塊數id:區塊的 IDinUseCount:使用的實體主機數量kind:所描述的物件種類name:區塊名稱physicalTopology.block:區塊 IDphysicalTopology.cluster:區塊所在的叢集reservationSubBlockCount:這個區塊中的子區塊數量reservationSubBlockInUseCount:使用的子區塊數量
說明預留區塊
您可以在特定區塊上使用 gcloud compute reservations blocks describe 指令,顯示指定區塊的資訊。
gcloud compute reservations blocks describe RESERVATION_NAME \ --block-name=BLOCK_NAME \ --project=PROJECT_ID \ --zone=ZONE
更改下列內容:
- RESERVATION_NAME:預訂名稱。
- BLOCK_NAME:預訂區塊的名稱。
- PROJECT_ID:專案 ID。
- ZONE:預留項目所在的可用區。
這個指令會顯示下列輸出:
resource: count: 32 creationTimestamp: '2025-11-05T15:00:15.223-08:00' healthInfo: degradedSubBlockCount: 0 healthStatus: HEALTHY healthySubBlockCount: 2 id: '2996501069483632657' inUseCount: 12 kind: compute#reservationBlock name: example-reservation-block-0001 physicalTopology: block: 9a0e671424e45fd480ca172ad7a4e25d cluster: example-cluster reservationMaintenance: schedulingType: GROUPED reservationSubBlockCount: 2 reservationSubBlockInUseCount: 1 selfLink: https://www.googleapis.com/compute/v1/projects/example-project/zones/us-central1-c/reservations/example-reservation/reservationBlocks/example-reservation-block-0001 selfLinkWithId: https://www.googleapis.com/compute/v1/projects/example-project/zones/us-central1-c/reservations/example-reservation/reservationBlocks/2996501069483632657 status: READY zone: https://www.googleapis.com/compute/v1/projects/example-project/zones/us-central1-c
輸出內容中的下列值說明預訂中的區塊:
count:區塊中的主機數量healthInfo.healthStatus:區塊的整體健康狀態healthInfo.healthySubblockCount:區塊中健康狀態良好的子模塊數id:區塊的 IDinUseCount:使用中的主機數量kind:所描述的物件種類name:區塊名稱physicalTopology.block:區塊 IDphysicalTopology.cluster:區塊所在的叢集reservationSubBlockCount:這個區塊中的子區塊數量reservationSubBlockInUseCount:使用的子區塊數量
列出區塊的所有子區塊
您可以在區塊中列出子區塊,顯示每個子區塊的資訊:
gcloud compute reservations sub-blocks list RESERVATION_NAME \ --block-name=BLOCK_NAME \ --project=PROJECT_ID \ --zone=ZONE
更改下列內容:
- RESERVATION_NAME:預訂名稱。
- BLOCK_NAME:預訂區塊的名稱。
- PROJECT_ID:專案 ID。
- ZONE:預留項目所在的可用區。
這項指令會顯示下列資訊:
count: 16 creationTimestamp: '2025-11-05T15:00:16.738-08:00' healthInfo: degradedHostCount: 0 degradedInfraCount: 0 healthStatus: HEALTHY healthyHostCount: 16 healthyInfraCount: 1 id: '8309376980435233263' inUseCount: 0 kind: compute#reservationSubBlock name: example-reservation-block-0001-subblock-0001 physicalTopology: block: 9a0e671424e45fd480ca172ad7a4e25d cluster: example-cluster subBlock: a0122935eb54d02750b65eef2d4f0366 reservationSubBlockMaintenance: schedulingType: GROUPED selfLink: https://www.googleapis.com/compute/v1/projects/example-project/zones/us-central1-c/reservations/example-reservation/reservationBlocks/example-reservation-block-0001/reservationSubBlocks/example-reservation-block-0001-subblock-0001 selfLinkWithId: https://www.googleapis.com/compute/v1/projects/example-project/zones/us-central1-c/reservations/example-reservation/reservationBlocks/example-reservation-block-0001/reservationSubBlocks/8309376980435233263 status: READY zone: https://www.googleapis.com/compute/v1/projects/example-project/zones/us-central1-c --- count: 16 creationTimestamp: '2025-11-05T15:00:16.736-08:00' healthInfo: degradedHostCount: 0 degradedInfraCount: 0 healthStatus: HEALTHY healthyHostCount: 16 healthyInfraCount: 1 id: '5629213080155482607' inUseCount: 12 kind: compute#reservationSubBlock name: example-reservation-block-0001-subblock-0002 physicalTopology: block: 9a0e671424e45fd480ca172ad7a4e25d cluster: example-cluster subBlock: 7aca49831e54d32970631524bc060d9c reservationSubBlockMaintenance: schedulingType: GROUPED selfLink: https://www.googleapis.com/compute/v1/projects/example-project/zones/us-central1-c/reservations/example-reservation/reservationBlocks/example-reservation-block-0001/reservationSubBlocks/example-reservation-block-0001-subblock-0002 selfLinkWithId: https://www.googleapis.com/compute/v1/projects/example-project/zones/us-central1-c/reservations/example-reservation/reservationBlocks/example-reservation-block-0001/reservationSubBlocks/5629213080155482607 status: READY zone: https://www.googleapis.com/compute/v1/projects/example-project/zones/us-central1-c
輸出內容中的下列值說明預訂中的子區塊:
count:主機數量healthInfo.degradedInfraCount:Ironwood 立方體光學電路交換器 (OCS) 的健康狀態。如果這個值是 1,表示 Ironwood 立方體的 OCS 已降級。這個值不適用於 TrilliumhealthInfo.healthStatus:子區塊的整體健康狀態healthInfo.healthyHostCount:子區塊中正常主機的數量id:區塊的 IDinUseCount:使用中的主機數量kind:所描述的物件種類name:子區塊名稱physicalTopology.block:包含這個子區塊的區塊 IDphysicalTopology.cluster:區塊所在的叢集physicalTopology.subblock:子區塊的 ID
說明預留項目子區塊
您可以使用 gcloud compute sub-blocks describe 查看子區塊的相關資訊:
gcloud compute reservations sub-blocks describe RESERVATION_NAME \ --block-name=BLOCK_NAME \ --sub-block-name=SUB_BLOCK_NAME \ --project=PROJECT_ID \ --zone=ZONE
更改下列內容:
- RESERVATION_NAME:預訂名稱。
- BLOCK_NAME:預訂區塊的名稱。
- SUB_BLOCK_NAME:預訂子區塊的名稱。
- PROJECT_ID:專案 ID。
- ZONE:預留項目所在的可用區。
這項指令會顯示下列資訊:
resource: count: 16 creationTimestamp: '2025-11-05T15:00:16.736-08:00' healthInfo: degradedHostCount: 0 degradedInfraCount: 0 healthStatus: HEALTHY healthyHostCount: 16 healthyInfraCount: 1 id: '5629213080155482607' inUseCount: 12 kind: compute#reservationSubBlock name: example-reservation-block-0001-subblock-0002 physicalTopology: block: 9a0e671424e45fd480ca172ad7a4e25d cluster: example-cluster subBlock: 7aca49831e54d32970631524bc060d9c reservationSubBlockMaintenance: schedulingType: GROUPED selfLink: https://www.googleapis.com/compute/v1/projects/example-project/zones/us-central1-c/reservations/example-reservation/reservationBlocks/example-reservation-block-0001/reservationSubBlocks/example-reservation-block-0001-subblock-0002 selfLinkWithId: https://www.googleapis.com/compute/v1/projects/example-project/zones/us-central1-c/reservations/example-reservation/reservationBlocks/example-reservation-block-0001/reservationSubBlocks/5629213080155482607 status: READY zone: https://www.googleapis.com/compute/v1/projects/example-project/zones/us-central1-c
輸出內容中的下列值說明預訂中的子區塊:
count:主機數量healthInfo.degradedInfraCount:Ironwood 立方體光學電路交換器 (OCS) 的健康狀態。如果這個值是 1,表示 Ironwood 立方體的 OCS 已降級。這個值不適用於 TrilliumhealthInfo.healthStatus:子區塊的整體健康狀態healthInfo.healthyHostCount:子區塊中正常主機的數量id:區塊的 IDinUseCount:使用中的主機數量kind:所描述的物件種類name:子區塊名稱physicalTopology.block:包含這個子區塊的區塊 IDphysicalTopology.cluster:區塊所在的叢集physicalTopology.subblock:子區塊的 ID
找出 TPU VM 執行個體的實際位置
佈建 TPU 節點後,即可擷取 TPU VM 執行個體的實體位置。這有助於瞭解 TPU VM 執行個體之間的相對距離,進而最佳化工作負載排程。
您可以使用 curl 或 Google Cloud CLI 找出 TPU VM 執行個體的實際位置:
curl
curl -s -H "Metadata-Flavor: Google" http://metadata.google.internal/computeMetadata/v1/instance/attributes/physical_host_topology
gcloud
gcloud compute instances describe VM_NAME \ --format="table[box,title=VM-Position](resourceStatus.physical_host_topology:label=location)" \ --zone=ZONE
更改下列內容:
- VM_NAME:TPU VM 的名稱。
- ZONE:TPU VM 所在的可用區。
這兩個指令都會顯示您指定 TPU VM 的叢集、區塊、子區塊和主機相關資訊:
block: 3feffcdeb6434d68bb818a836f75c1b8 cluster: southamerica-west1-cluster-njga subblock: cbee689cb721abdb0c7f80a4f2d0c1c7 host: 36b2d9731c1e1cf8594a759c8c4178f0