This document explains how to use the host maintenance features that are available in AI Hypercomputer. It explains how to monitor, plan for, and perform scheduled maintenance on your reserved blocks of capacity. To manage maintenance on your virtual machine (VM) instances, see instead Manage host events across VMs.
You can proactively manage upcoming maintenance host events on your reserved blocks of capacity, whether VMs are running on them or not. This approach helps you minimize disruptions and maintain optimal performance.
Before you begin
Select the tab for how you plan to use the samples on this page:
Console
When you use the Google Cloud console to access Google Cloud services and APIs, you don't need to set up authentication.
gcloud
In the Google Cloud console, activate Cloud Shell.
At the bottom of the Google Cloud console, a Cloud Shell session starts and displays a command-line prompt. Cloud Shell is a shell environment with the Google Cloud CLI already installed and with values already set for your current project. It can take a few seconds for the session to initialize.
REST
To use the REST API samples on this page in a local development environment, you use the credentials you provide to the gcloud CLI.
Install the Google Cloud CLI. After installation, initialize the Google Cloud CLI by running the following command:
gcloud initIf you're using an external identity provider (IdP), you must first sign in to the gcloud CLI with your federated identity.
For more information, see Authenticate for using REST in the Google Cloud authentication documentation.
Required roles
To get the permissions that you need to manage host maintenance events across reservations, ask your administrator to grant you the following IAM roles:
-
Compute Admin (
roles/compute.admin) on the project -
For read-only access to System Event audit logs:
Logs Viewer (
roles/logging.viewer) on the project
For more information about granting roles, see Manage access to projects, folders, and organizations.
These predefined roles contain the permissions required to manage host maintenance events across reservations. To see the exact permissions that are required, expand the Required permissions section:
Required permissions
The following permissions are required to manage host maintenance events across reservations:
-
To start host maintenance for a reservation:
compute.reservations.performMaintenanceon the project -
To start host maintenance for a reservation block:
compute.reservationBlocks.performMaintenanceon the project -
To start host maintenance for a reservation sub-block:
compute.reservationSubBlocks.performMaintenanceon the project -
To view a list reservations:
compute.reservations.liston the project -
To view the details of a reservation:
compute.reservations.geton the project -
To view a list of blocks within a reservation:
compute.reservationBlocks.liston the project -
To view a list of sub-blocks within a reservation block:
compute.reservationSubBlocks.liston the project
You might also be able to get these permissions with custom roles or other predefined roles.
Overview
To optimize the maintenance of your reserved blocks of capacity, complete the following steps:
Set up notification alerts. Create log-based alerts to get notified about scheduled, started, or completed maintenance events for a reservation, a reservation block, or a reservation sub-block. This approach helps you proactively plan your activities and avoid unexpected downtime.
Manage maintenance across blocks of capacity. View and, if needed, manually start maintenance across your reservations, reservation blocks, or reservation sub-blocks. This process helps you increase the resilience of your workloads to host errors, prevent downtime, and ensure that your applications remain available.
For more information about the frequency and maintenance behavior of your reserved machine types, see Understand host maintenance.
Set up notification alerts for reservations
You can get notified about scheduled, started, or completed maintenance events for a reservation, reservation block, or reservation sub-block by creating log-based alerting policies.
To create an alert for the maintenance events of a reservation, a reservation block, or a reservation sub-block, complete the following procedure. Repeat this procedure for each alert that you want to create.
-
In the Google Cloud console, go to the Logs Explorer page:
If you use the search bar to find this page, then select the result whose subheading is Logging.
Click the Show query toggle to the on position.
In the Query pane, build one of the following queries. These queries filter log entries to identify specific maintenance events. Repeat this procedure for each query you want to create.
Receive maintenance alerts for a reservation:
To receive alerts when maintenance is scheduled:
protoPayload.methodName="compute.reservations.upcomingGroupMaintenance" severity>=DEFAULT protoPayload.status.message =~ "scheduled"To receive alerts when maintenance has completed:
protoPayload.methodName="compute.reservations.completedGroupMaintenance" severity>=DEFAULT protoPayload.status.message =~ "completed"
Receive maintenance alerts for a reservation block:
To receive alerts when maintenance is scheduled:
protoPayload.methodName="compute.reservations.block.upcomingGroupMaintenance" severity>=DEFAULT protoPayload.status.message =~ "scheduled"To receive alerts when maintenance has started:
protoPayload.methodName="compute.reservations.block.startGroupMaintenance" severity>=DEFAULT protoPayload.status.message =~ "started"To receive alerts when maintenance has completed:
protoPayload.methodName="compute.reservations.block.completedGroupMaintenance" severity>=DEFAULT protoPayload.status.message =~ "completed"
Receive maintenance alerts for a reservation sub-block of A4X VMs:
To receive alerts when maintenance is scheduled:
protoPayload.methodName="compute.reservations.subBlock.upcomingGroupMaintenance" severity>=DEFAULT protoPayload.status.message =~ "scheduled"To receive alerts when maintenance has started:
protoPayload.methodName="compute.reservations.subBlock.startGroupMaintenance" severity>=DEFAULT protoPayload.status.message =~ "started"To receive alerts when maintenance has completed:
protoPayload.methodName="compute.reservations.subBlock.completedGroupMaintenance" severity>=DEFAULT protoPayload.status.message =~ "completed"To receive alerts when an A4X machine encounters an error and Compute Engine starts maintenance:
protoPayload.methodName="compute.reservations.subblock.unusedmachinerepair" severity>=DEFAULT protoPayload.status.message =~ "maintenance"To receive alerts when maintenance for an A4X machine that encountered an error has completed:
protoPayload.methodName="compute.reservations.subblock.unusedmachinerepaircomplete" severity>=DEFAULT protoPayload.status.message =~ "repaired"
To validate the query, click Run query. If the query is valid, then the Query results pane displays log entries that match the query.
In the Query results toolbar, click the Actions list, and then select Create log alert. The Create logs-based alert policy pane appears.
In the Alert details section, do the following:
In the Alert Policy Name field, enter a name for the policy.
In the Policy severity level list, select Warning (or a higher severity).
Click Next.
In the Choose logs to include in the alert section, click Next.
In the Set notification frequency and autoclose duration section, specify the following:
In the Time between notifications list, select how often you want to be notified.
In the Incident autoclose duration list, select after how long Cloud Logging stops sending notifications and automatically closes the incident.
Click Next.
In the Who should be notified? section, specify a notification channel for Logging to send notifications to.
Click Save.
Manage maintenance across reservations
You can view and control maintenance for your reservations, reservation blocks, and reservation sub-blocks as follows:
To check the state and scheduled time of upcoming maintenance for your reservations, reservation blocks, or reservation sub-blocks, view maintenance state.
To manually start maintenance on a reservation, reservation block, or reservation sub-block, rather than waiting for the scheduled maintenance date and time, manually start maintenance.
To manage how early you want to receive notifications when a VM's host requires emergency, unplanned maintenance after a host error or faulty host report, manage hardware emergency maintenance notifications.
View maintenance state
You can view the upcoming maintenance state for a reservation, a reservation
block, or a reservation sub-block by checking the value of the
upcomingGroupMaintenance field in their metadata. If a reservation lacks the
upcomingGroupMaintenance field, then no maintenance is scheduled for the
reservation, reservation block, or reservation sub-block. For more information
about the fields in upcomingGroupMaintenance, see
Maintenance status definitions
in the Compute Engine documentation.
Additionally, if maintenance is scheduled for a reservation block or sub-block,
the upcomingGroupMaintenance field contains the maintenanceReasons field.
This field describes why maintenance was scheduled for your reservation block or
sub-block, as described in the following table:
| Maintenance type | Maintenance reason | VMs state |
|---|---|---|
| Planned maintenance after faulty host report | FAILURE_GPU_CUSTOMER_REPORTED |
Applies only to VMs that are running on the host that you reported as faulty. |
| Planned maintenance for regular maintenance |
|
Applies to running, stopped, or suspended VMs. |
| Unplanned, emergency maintenance |
|
Applies only to running VMs. |
To view the maintenanceReasons field in a reservation block or sub-block, or
view the maintenance state of a sub-block, you must use the
gcloud CLI or REST API. Otherwise, select one of the
following options:
Console
In the Google Cloud console, go to the Reservations page.
In the Maintenance status column, Compute Engine displays the maintenance state of your reservations. If you don't see this column in the On-demand reservations table, then click Column display options, select the Maintenance status checkbox, and then click OK.
To view the maintenance state of a reservation block, complete the following steps:
In the Name column, click the name of the reservation. A page that gives the details of the reservation appears.
In the Blocks table, in the Maintenance column, Compute Engine displays the maintenance state of the blocks within the reservation.
gcloud
To view the maintenance state of a reservation, use the
gcloud compute reservations describecommand with the--flattenflag set toresourceStatus.reservationMaintenance:gcloud compute reservations describe RESERVATION_NAME \ --flatten=resourceStatus.reservationMaintenance \ --zone=ZONEReplace the following:
RESERVATION_NAME: the name of the reservation.ZONE: the zone where the reservation exists.
The output is similar to one of the following:
If maintenance is scheduled for your reservation, then the output is similar to the following:
--- maintenanceOngoingCount: 0 maintenancePendingCount: 6 schedulingType: GROUPED upcomingGroupMaintenance: canReschedule: true maintenanceStatus: PENDING type: UNSCHEDULED windowEndTime: '2025-11-13T14:00:00.000-08:00' windowStartTime: '2025-11-13T12:00:00.000-08:00'If the
schedulingTypefield is set toINDEPENDENT, then theupcomingGroupMaintenancefield doesn't contain thewindowStartTimeandwindowEndTimefields. To see when maintenance is scheduled for a VM that runs on a reserved host, view the maintenance state for the VM.If maintenance isn't scheduled for your reservation, then the output is similar to the following:
--- schedulingType: GROUPED
If maintenance is scheduled for your reservation, then, to view the maintenance state of the blocks within the reservation, use the
gcloud compute reservations blocks listcommand:gcloud compute reservations blocks list RESERVATION_NAME \ --zone=ZONEIf maintenance is scheduled or ongoing for a reservation block, then the output is similar to the following:
--- ... name: example-fr-a3u-dense-1-block-0001 ... reservationBlockMaintenance: maintenanceOngoingCount: 0 maintenancePendingCount: 6 schedulingType: GROUPED upcomingGroupMaintenance: canReschedule: true maintenanceReasons: - PLANNED_UPDATE - PLANNED_NETWORK_UPDATE maintenanceStatus: PENDING type: UNSCHEDULED windowEndTime: '2025-11-13T14:00:00.000-08:00' windowStartTime: '2025-11-13T12:00:00.000-08:00' ... --- ... name: example-fr-a3u-dense-1-block-0002 ... schedulingType: GROUPED ...If maintenance is scheduled for a reservation block, then, to view the maintenance state of sub-blocks within the reservation block, use the
gcloud compute reservations sub-blocks listcommand:gcloud compute reservations sub-blocks list RESERVATION_NAME \ --block-name=BLOCK_NAME \ --zone=ZONEReplace
BLOCK_NAMEwith the name of a block that exists within the reservation.If maintenance is scheduled or ongoing for a reservation sub-block, then the output is similar to the following:
... reservationSubBlockMaintenance: instanceMaintenanceOngoingCount: 0 instanceMaintenancePendingCount: 3 maintenanceOngoingCount: 0 maintenancePendingCount: 32 schedulingType: GROUPED subblockInfraMaintenanceOngoingCount: 0 subblockInfraMaintenancePendingCount: 0 upcomingGroupMaintenance: canReschedule: true maintenanceReasons: - PLANNED_UPDATE - PLANNED_NETWORK_UPDATE maintenanceStatus: PENDING type: SCHEDULED windowEndTime: '2025-11-13T14:00:00.000-08:00' windowStartTime: '2025-11-13T12:00:00.000-08:00' ...
REST
To view the maintenance state of your reservations, make a
GETrequest to one of the following methods:To view reservations across all zones, use the
reservations.aggregatedListmethod.To view reservations in a specific zone, use the
reservations.listmethod.
In the request URL, include the following query parameters:
To only show the name, reserved machine type, and maintenance status of a reservation, include the
fieldsquery parameter set toitems.name,items.specificReservation.instanceProperties.machineType,items.resourceStatus.reservationMaintenance.To only filter by reservations that specify a specific machine type, include the
filterquery parameter set tospecificReservation.instanceProperties.machineType:MACHINE_TYPEby using URL-encoded values.
For example, to view reservations across all zones, make a
GETrequest as follows:GET https://compute.googleapis.com/compute/v1/projects/PROJECT_ID/aggregated/reservations?fields=items.name,items.specificReservation.instanceProperties.machineType,items.resourceStatus.reservationMaintenance&filter=specificReservation.instanceProperties.machineType%3AMACHINE_TYPEReplace the following:
PROJECT_ID: the ID of the project where the reservations exist.MACHINE_TYPE: the reserved machine type that you want to filter your reservations by.
If maintenance is scheduled or ongoing for a reservation, then the output is similar to the following:
{ "items": [ { "specificReservation": { "instanceProperties": { "machineType": "MACHINE_TYPE" } }, "name": "example-reservation", "resourceStatus": { "reservationMaintenance": { maintenanceOngoingCount: 0, maintenancePendingCount: 6, "schedulingType": "GROUPED", "upcomingGroupMaintenance": { "type": "SCHEDULED", "canReschedule": true, "windowStartTime": "2025-11-13T12:00:00.000-08:00", "windowEndTime": "2025-11-13T14:00:00.000-08:00", "maintenanceStatus": "PENDING" } } } }, ... ] }Optionally, to further narrow down a list of VMs, set the
filterquery parameter to a different filter expression.If maintenance is scheduled for your reservation, then, to view the maintenance state of the blocks within the reservation, make a
GETrequest to thereservationBlocks.listmethod. In the request URL, include thefieldsquery parameter set toitems.name,items.reservationMaintenance:GET https://compute.googleapis.com/compute/v1/projects/PROJECT_ID/zones/ZONE/reservations/RESERVATION_NAME/reservationBlocks?fields=items.name,items.reservationMaintenanceReplace
RESERVATION_NAMEwith the name of the reservation.If maintenance is scheduled or ongoing for a reservation block, then the output is similar to the following:
{ "items": [ { "name": "example-fr-a3u-dense-1-block-0001", "reservationBlockMaintenance": { "maintenanceOngoingCount": 0, "maintenancePendingCount": 6, "schedulingType": "GROUPED", "upcomingGroupMaintenance": { "type": "SCHEDULED", "canReschedule": true, "windowStartTime": "2025-11-13T12:00:00.000-08:00", "windowEndTime": "2025-11-13T14:00:00.000-08:00", "maintenanceStatus": "PENDING", "maintenanceReasons": [ "PLANNED_UPDATE", "PLANNED_NETWORK_UPDATE" ] } } }, ... ] }If the
schedulingTypefield is set toINDEPENDENTfor a block, then theupcomingGroupMaintenancefield doesn't contain thewindowStartTimeandwindowEndTimefields. To see when maintenance is scheduled for a VM that runs on a reserved block, view the maintenance state for the VM.If maintenance is scheduled for a reservation block, then, to view the maintenance state of the sub-blocks within the reservation block, make a
GETrequest to thereservationSubBlocks.listmethod:GET https://compute.googleapis.com/compute/v1/projects/PROJECT_ID/zones/ZONE/reservations/RESERVATION_NAME/reservationBlocks/BLOCK_NAME/reservationSubBlocksReplace
BLOCK_NAMEwith the name of a block that exists within the reservation.If maintenance is scheduled or ongoing for a reservation sub-block, then the output is similar to the following:
{ "items": [ { "name": "example-fr-a3u-dense-1-block-0001", "reservationSubBlockMaintenance": { "instanceMaintenanceOngoingCount": 0, "instanceMaintenancePendingCount": 3, "maintenanceOngoingCount": 0, "maintenancePendingCount": 6, "schedulingType": "GROUPED", "subblockInfraMaintenanceOngoingCount": 0, "subblockInfraMaintenancePendingCount": 0, "upcomingGroupMaintenance": { "type": "SCHEDULED", "canReschedule": true, "windowStartTime": "2025-11-13T12:00:00.000-08:00", "windowEndTime": "2025-11-13T14:00:00.000-08:00", "maintenanceStatus": "PENDING", "maintenanceReasons": [ "PLANNED_UPDATE", "PLANNED_NETWORK_UPDATE" ] } } }, ... ] }
Manually start maintenance
You can manually start maintenance for your reservations, reservation blocks, or reservation sub-blocks instead of waiting for the scheduled time. This action helps you more proactively control disruptions to your workloads.
Depending on the maintenance state of a reservation, reservation block, or reservation sub-block, the following occurs:
| Maintenance state | Description | What you see |
|---|---|---|
| Scheduled | Compute Engine has scheduled maintenance for the reservation. You can manually start maintenance before the scheduled time. |
|
| In progress | Maintenance is underway. You can't reschedule it. |
|
| Complete | Maintenance is finished. Compute Engine has removed all maintenance notifications from the VM. |
|
To manually start maintenance on specific hosts within a reservation block, or manually start maintenance on a reservation sub-block, use the gcloud CLI or REST API. Otherwise, select one of the following options:
Console
In the Google Cloud console, go to the Reservations page.
In the Name column, click the name of a reservation. A page that gives the details of the reservation appears.
Click Run maintenance, and then select one of the following options:
To start maintenance on all blocks, select All capacity.
To start maintenance only on blocks with running VMs, select In-use capacity.
To start maintenance only on unused blocks and blocks with stopped or suspended VMs, select Unused capacity.
To confirm, click Ok.
gcloud
To start maintenance on a reservation, use the
gcloud compute reservations perform-maintenancecommand:gcloud compute reservations perform-maintenance RESERVATION_NAME \ --scope=RESERVATION_MAINTENANCE_SCOPE \ --zone=ZONETo start maintenance on a reservation block, use the
gcloud compute reservations blocks perform-maintenancecommand:gcloud compute reservations blocks perform-maintenance RESERVATION_NAME \ --block-name=BLOCK_NAME \ --scope=BLOCK_MAINTENANCE_SCOPE \ --zone=ZONETo start maintenance on a reservation sub-block, use the
gcloud compute reservations sub-blocks perform-maintenancecommand:gcloud compute reservations sub-blocks perform-maintenance RESERVATION_NAME \ --block-name=BLOCK_NAME \ --subblock-name=SUB_BLOCK_NAME \ --zone=ZONE
Replace the following:
RESERVATION_NAME: the name of the reservation.RESERVATION_MAINTENANCE_SCOPE: the maintenance scope for the reservation. Specify one of the following values:To start maintenance on all blocks:
allTo start maintenance only on blocks with running VMs:
runningTo start maintenance only on unused blocks and blocks with stopped or suspended VMs:
unused
BLOCK_NAME: the name of a block that exists within the reservation.SUB_BLOCK_NAME: the name of a sub-block that exists within the reservation block.BLOCK_MAINTENANCE_SCOPE: the maintenance scope for the reservation block. Specify one of the following values:To start maintenance on all hosts:
allTo start maintenance only on hosts with running VMs:
runningTo start maintenance only on unused hosts and hosts with stopped or suspended VMs:
unused
ZONE: the zone where the reservation exists.
REST
To start maintenance on a reservation, make a
POSTrequest to thereservations.performMaintenancemethod:POST https://compute.googleapis.com/compute/v1/projects/PROJECT_ID/zones/ZONE/reservations/RESERVATION_NAME/performMaintenance { "maintenanceScope": "RESERVATION_MAINTENANCE_SCOPE" }To start maintenance on a reservation block, make a
POSTrequest to thereservationBlocks.performMaintenancemethod:POST https://compute.googleapis.com/compute/v1/projects/PROJECT_ID/zones/ZONE/reservations/RESERVATION_NAME/reservationBlocks/BLOCK_NAME/performMaintenance { "maintenanceScope": "BLOCK_MAINTENANCE_SCOPE" }To start maintenance on a reservation sub-block, make a
POSTrequest to thereservationSubBlocks.performMaintenancemethod:POST https://compute.googleapis.com/compute/v1/projects/PROJECT_ID/zones/ZONE/reservations/RESERVATION_NAME/reservationBlocks/BLOCK_NAME/reservationSubBlocks/SUB_BLOCK_NAME/performMaintenance { "maintenanceScope": "BLOCK_MAINTENANCE_SCOPE" }
Replace the following:
PROJECT_ID: the ID of the project where Compute Engine automatically created the reservation.ZONE: the zone where the reservation exists.RESERVATION_NAME: the name of the reservation.RESERVATION_MAINTENANCE_SCOPE: the maintenance scope for the reservation. Specify one of the following values:To start maintenance on all blocks:
ALLTo start maintenance only on blocks with running VMs:
RUNNINGTo start maintenance only on unused blocks and blocks with stopped or suspended VMs:
UNUSED
BLOCK_NAME: the name of a block that exists within the reservation.SUB_BLOCK_NAME: the name of a sub-block that exists within the reservation block.BLOCK_MAINTENANCE_SCOPE: the maintenance scope for the reservation block. Specify one of the following values:To start maintenance on all hosts:
ALLTo start maintenance only on hosts with running VMs:
RUNNINGTo start maintenance only on unused hosts and hosts with stopped or suspended VMs:
UNUSED
Manage hardware emergency maintenance notifications
After a VM encounters an host error, or you report its host as faulty, the VM's host requires emergency, unplanned maintenance. By default, Compute Engine provides a few hours of advance notice when it schedules this type of maintenance. For reserved hosts, you can enable emergency maintenance notifications to increase this notice period to at least seven days. This approach helps you more proactively control disruptions to your workloads.
To enable or disable hardware emergency maintenance notifications for a reservation, select one of the following options:
gcloud
To enable hardware emergency maintenance notifications for a reservation, use the
gcloud compute reservations updatecommand with the--enable-emergent-maintenanceflag:gcloud compute reservations update RESERVATION_NAME \ --enable-emergent-maintenance \ --zone=ZONETo disable hardware emergency maintenance notifications for a reservation, use the
gcloud compute reservations updatecommand with the--no-enable-emergent-maintenanceflag:gcloud compute reservations update RESERVATION_NAME \ --no-enable-emergent-maintenance \ --zone=ZONE
Replace the following:
RESERVATION_NAME: the name of the reservation.ZONE: the zone where the reservation exists.
REST
To enable or disable hardware emergency maintenance notifications for a
reservation, make a PATCH request to the
reservations.update method.
In the request URL, include the paths query parameter set to
enableEmergentMaintenance.
PATCH https://compute.googleapis.com/compute/v1/projects/PROJECT_ID/zones/ZONE/reservations/RESERVATION_NAME?paths=enableEmergentMaintenance
{
"name": "RESERVATION_NAME",
"enableEmergentMaintenance": EMERGENCY_MAINTENANCE_NOTIFICATIONS
}
Replace the following:
PROJECT_ID: the ID of the project where the reservation exists.ZONE: the zone where the reservation exists.RESERVATION_NAME: the name of the reservation.EMERGENCY_MAINTENANCE_NOTIFICATIONS: specify one of the following values:To enable notifications:
trueTo disable notifications:
false