This document explains how to modify a cluster in Cluster Director.
After you create a cluster in Cluster Director, you can modify one or more cluster properties. Modifying a cluster can help you save costs or optimize performance when the needs of your workload change. For example, you can add or delete storage resources, update prolog or epilog scripts, or replace the compute resources that a cluster partition uses.
Limitations
When you modify a cluster, the following limitations apply:
Don't modify or delete Compute Engine resources that contain the label
hypercomputeClusterResourceset totrue. Cluster Director provisions and manages these resources. If you modify them, then you might encounter errors. To check the labels for a Compute Engine resource, see View labels.You can't edit or delete the storage resource that a cluster uses for the
/homedirectory.
Modifiable cluster properties
You can only modify the following properties in a cluster. If you need to change any other property, then you must first create a replacement cluster with updated properties, and then delete the original cluster if you no longer need it.
Description: you can update the cluster's description.
Labels: you can add, modify, or delete the labels associated with the cluster.
Compute resources: you can add new compute resource configurations or delete unused ones. You can't modify the properties of existing compute resource configurations.
Storage resources: you can add or delete Filestore instances, Google Cloud Managed Lustre instances, or Cloud Storage buckets. You can't use different types of storage resources in a cluster partition.
Orchestrator: you can modify the Slurm orchestrator's configuration, including login nodes, OS images, prolog or epilog scripts, nodesets, and partitions.
Based on the properties that you want to modify, modifying your cluster might restart, recreate, or delete nodes. By understanding the impact of each property change in the following sections, you can better plan updates and minimize disruptions:
Disruptive cluster property updates
When you modify a cluster, modifying one or more of the following properties requires stopping and restarting, recreating, or deleting nodes:
Updates that delete all affected compute nodes: when you update one or more of the following properties, Cluster Director deletes the affected compute nodes:
Delete compute resources (
computeResources): deleting a compute resource configuration deletes all compute nodes that use that configuration, unless you specify a replacement configuration for the nodes. The replacement configuration can't specify a different consumption option or zone. If you don't specify a replacement configuration, then you must delete any nodesets that use the compute resource configuration that you want to delete. Otherwise, modifying the cluster fails.Decrease static or dynamic node count (
staticNodeCountormaxDynamicNodeCount): deleting compute nodes from a nodeset stops and deletes those nodes. Slurm abruptly stops any running jobs on the deleted nodes.
Updates to storage resources that can disrupt jobs: when you update one or more of the following properties, Cluster Director might disrupt the running jobs in your cluster:
Delete or modify storage resources (
storageResources): deleting or modifying Filestore instances, Managed Lustre instances, or Cloud Storage buckets disrupts any running jobs that are reading files from that resource that you're modifying. We recommend that you modify or delete these storage resources when all jobs on the affected compute nodes have finished running.Delete storage configurations (
storageConfigs): deleting storage resource configurations is likely to disrupt any running jobs that are reading files from any resources that use those configurations. All nodesets in your cluster must use the same storage configuration. Otherwise, modifying your cluster fails.
Updates that restart, recreate, or delete all login nodes: when you update one or more of the properties within the login nodes configuration (
orchestrator.slurm.loginNodes), Cluster Director recreates, deletes, or stops and restarts the login nodes. Recreating or restarting login nodes doesn't affect the queued or running jobs.Modify boot disks (
bootDisk): modifying any boot disk properties (type,sizeGb,image) recreates all login nodes.Decrease the number of login nodes (
count): decreasing the number of login nodes deletes the affected nodes.Enable or disable OS login operations (
enableOsLogin): allowing or disallowing OS login restarts all login nodes.Enable or disable external IP addresses (
enablePublicIps): allowing or disallowing resources from being publicly accessible restarts all login nodes.Modify machine type (
machineType): changing the machine type restarts or recreates all login nodes based on the chosen machine type and its availability.Modify startup script (
startupScript): modifying the startup script for the login nodes stops and restarts all login nodes.Modify startup script timeout (
startupScriptTimeout): modifying the timeout for the startup script doesn't restart or recreate login nodes. The new timeout is enforced the next time a login node restarts.
Non-disruptive cluster property updates
When you modify a cluster, modifying one or more of the following properties doesn't restart or recreate existing nodes:
Modify description (
description): modifying the description for your cluster doesn't restart or recreate nodes.Modify labels (
labels): adding, modifying, or deleting labels that are associated with your cluster doesn't restart or recreate nodes.Add storage resources (
storageResources): adding Filestore instances, Managed Lustre instances, or Cloud Storage buckets doesn't restart or recreate nodes.Slurm login nodes (
orchestrator.slurm.loginNodes):Modify labels (
labels): modifying labels for login nodes doesn't disrupt your workloads.Add or delete storage configurations (
storageConfigs): adding or deleting storage resources that are mounted on login nodes doesn't disrupt your workloads.Modify service account (
serviceAccount): modifying the service account doesn't disrupt your workloads.
Slurm nodesets (
orchestrator.slurm.nodeSets):Increase static or dynamic node count (
staticNodeCountormaxDynamicNodeCount): increasing the number of static or dynamic nodes in a nodeset doesn't restart or recreate existing nodes.Modify labels (
labels): modifying labels for a nodeset doesn't restart or recreate nodes.Modify startup script (
computeInstance.startupScript) or boot disks (computeInstance.bootDisk): modifying the startup script or boot disk properties for a nodeset doesn't disrupt running jobs. Cluster Director applies the changes to new compute nodes, and to existing compute nodes after they finish running all jobs and their state changes toidle.Add storage configurations (
storageConfigs): adding storage resources in a nodeset doesn't restart or recreate existing compute nodes. All nodesets in a Slurm cluster must use the same storage configuration.
Create or delete Slurm partitions (
orchestrator.slurm.partitions): creating or deleting Slurm partitions doesn't restart, recreate, or delete nodes. However, if you delete a partition that has running jobs, Slurm stops those jobs.Modify Slurm default partition (
orchestrator.slurm.defaultPartition): modifying the default Slurm partition doesn't restart or recreate nodes. If you have scheduled jobs, verify that modifying the default partition isn't disruptive.Modify Slurm prolog script (
orchestrator.slurm.prologBashScripts): modifying the prolog script runs the script for only new and queued jobs; existing jobs are unaffected.Modify Slurm epilog script (
orchestrator.slurm.epilogBashScripts): modifying the epilog script updates the script for all jobs. The updated script runs when any active job on a node finishes, even if the job started before you update the script.
Before you begin
Before you modify a cluster, consider the following:
If you want to delete a storage resource, then back up any data that you want to retain. Deleting a storage resource is irreversible. To review your options for backing up data, see Data protection options.
Select the tab for how you plan to use the samples on this page:
Console
When you use the Google Cloud console to access Google Cloud services and APIs, you don't need to set up authentication.
gcloud
In the Google Cloud console, activate Cloud Shell.
At the bottom of the Google Cloud console, a Cloud Shell session starts and displays a command-line prompt. Cloud Shell is a shell environment with the Google Cloud CLI already installed and with values already set for your current project. It can take a few seconds for the session to initialize.
REST
To use the REST API samples on this page in a local development environment, you use the credentials you provide to the gcloud CLI.
Install the Google Cloud CLI.
If you're using an external identity provider (IdP), you must first sign in to the gcloud CLI with your federated identity.
For more information, see Authenticate for using REST in the Google Cloud authentication documentation.
Required roles
To get the permission that
you need to modify a cluster,
ask your administrator to grant you the
Cluster Director Editor (roles/hypercomputecluster.editor) IAM role on your project.
For more information about granting roles, see Manage access to projects, folders, and organizations.
This predefined role contains the
hypercomputecluster.clusters.update
permission,
which is required to
modify a cluster.
You might also be able to get this permission with custom roles or other predefined roles.
Modify a cluster
Before you modify a cluster, understand how changes can impact your running workloads:
Disruptive updates: modifying some properties stops or recreates nodes. If you haven't already, review which updates can disrupt your workload. For a list of these properties, see Disruptive cluster property updates.
Planned maintenance: to minimize disruptions, we recommend that you update cluster properties that restart or recreate nodes during planned maintenance. Unless your nodes use N2 machine types, host maintenance events stop and restart nodes. To start a host maintenance event before its planned time, see Manually start planned host maintenance events.
You can't edit the description or labels in your cluster by using the Google Cloud console; you must use the gcloud CLI or the REST API. To modify other cluster properties, select one of the following options:
Console
In the Google Cloud console, go to the Cluster Director page.
In the navigation menu, click Clusters. The Clusters page appears.
In the Clusters table, in the Name column, click the name of the cluster that you want to modify. The page that gives the details of your cluster appears.
Click Edit. A page that lets you edit your cluster appears.
In the Compute section, you can modify the compute resource configurations that your cluster uses. To do so, do one of more of the following:
To add or edit a compute resource configuration, do one of the following:
To add resource configuration, click Add resource configuration.
To edit resource configuration, click Edit resource configuration.
In the Add resource configuration pane that appears, you can do one or more of the following:
To specify a machine type, click the General purpose or GPUs tab, and then follow the prompts to specify the machine type.
In the Number of instances field, enter the number of compute instances for the resource configuration.
To delete a resource configuration, click Delete resource configuration.
Click Continue.
To add or delete storage resources in your cluster, in the Storage section, do one or more of the following:
To add a storage resource, click Add storage configuration, and then follow the prompts to specify the configuration for the resource.
To delete a storage resource other than the one used for the
/homedirectory, locate the storage resource, and then click Delete storage plan.
Click Continue.
To modify the login nodes in your cluster, expand the Login node section, and then do one or more of the following:
In the Node count field, enter the number of compute instances for the node.
In the Source image list, select the OS image.
In the Startup script field, enter or modify the startup script for your cluster.
In the Startup script timeout field, enter or modify the timeout for the startup script. The value must be at least
1(one second).In the Boot disk type list, select the type of boot disk that you want your login node to use.
In the Boot disk size field, enter the number of GB for the disk. You must enter a value of
50or higher. However, to help ensure that your nodes boot up quickly, we recommend that you specify a value of at least100.
To modify partitions and nodesets, expand the Partitions section, and then add, delete, or edit the partitions and nodesets as needed.
To add or edit prolog or epilog scripts, expand Advanced orchestration settings, and then add or edit your scripts.
Click Save. A page that gives the details of your cluster appears. Based on the properties that you modify, modifying the cluster can take some time to complete.
gcloud
To modify a cluster, use the
gcloud alpha cluster-director clusters update command.
Based on how you want to specify the cluster configuration, use one of the following methods. To modify storage resources, you must modify the cluster by specifying a configuration file.
Specify a configuration file: to modify a cluster by specifying the cluster configuration in a JSON file, use the
--configflag. To run the command, select one of the following options:Bash
gcloud alpha cluster-director clusters update CLUSTER_NAME \ --location=REGION \ --config=CONFIGURATION_FILEPowershell
gcloud alpha cluster-director clusters update CLUSTER_NAME ` --location=REGION ` --config=CONFIGURATION_FILEcmd.exe
gcloud alpha cluster-director clusters update CLUSTER_NAME ^ --location=REGION ^ --config=CONFIGURATION_FILEReplace the following:
CLUSTER_NAME: the name of the cluster that you want to modify.REGION: the region where your cluster exists.CONFIGURATION_FILE: the path to the JSON file that contains the configuration details for the cluster. To review the configuration details that you can specify, review the request body for modifying a cluster by using REST.
Specify each cluster property: to modify a cluster by specifying each configuration property directly, include one or more of the following flags based on the cluster properties that you want to modify:
To modify the cluster description:
--descriptionTo add or delete labels in the cluster, use one of the following flags:
To add labels:
--add-labelsTo delete labels:
--remove-labels
To modify the Slurm nodesets, use one of the following flags:
To add a nodeset:
--add-slurm-node-setsTo update a nodeset:
--update-slurm-node-setsTo delete a nodeset:
--remove-slurm-node-sets
To modify the Slurm partitions, use one of the following flags:
To add a partition:
--add-slurm-partitionsTo update a partition:
--update-slurm-partitionsTo delete a partition:
--remove-slurm-partitionsTo modify the default partition:
--slurm-default-partition
To add or delete on-demand compute instances, use one of the following flags:
To add on-demand compute instances:
--add-on-demand-instancesTo delete on-demand compute instances:
--remove-on-demand-instances
To add or delete reserved compute instances, use one of the following flags:
To add reserved compute instances:
--add-reserved-instancesTo delete reserved compute instances:
--remove-reserved-instances
To add or delete Spot VMs, use one of the following flags:
To add Spot VMs:
--add-spot-instancesTo delete Spot VMs:
--remove-spot-instances
To add or delete Flex-start VMs, use one of the following flags:
To add Flex-start VMs:
--add-dws-flex-instancesTo delete Flex-start VMs:
--remove-dws-flex-instances
For example, assume that you want to modify a cluster by modifying the cluster description, adding a new label, modifying the login nodes, adding a new nodeset, partition, and compute resource configuration. To make these changes in your cluster, select one of the following options:
Bash
gcloud alpha cluster-director clusters update CLUSTER_NAME \ --location=REGION \ --description="DESCRIPTION" \ --add-labels=CLUSTER_LABEL_KEY=CLUSTER_LABEL_VALUE \ --add-on-demand-instances=id=NEW_COMPUTE_RESOURCE_NAME,zone=NEW_ZONE,machineType=NEW_MACHINE_TYPE \ --add-slurm-node-sets=id=NODESET_NAME,computeId=COMPUTE_RESOURCE_NAME,staticNodeCount=STATIC_NUMBER_VMS,maxDynamicNodeCount=MAX_DYNAMIC_NUMBER_VMS,startupScript=STARTUP_SCRIPT_NODESET,labels="NODESET_LABEL" \ --add-slurm-partitions=id=PARTITION_NAME,nodesetIds=[NODESET_NAME]Powershell
gcloud alpha cluster-director clusters update CLUSTER_NAME ` --location=REGION ` --description="DESCRIPTION" ` --add-labels=CLUSTER_LABEL_KEY=CLUSTER_LABEL_VALUE ` --add-on-demand-instances=id=NEW_COMPUTE_RESOURCE_NAME,zone=NEW_ZONE,machineType=NEW_MACHINE_TYPE ` --add-slurm-node-sets=id=NODESET_NAME,computeId=COMPUTE_RESOURCE_NAME,staticNodeCount=STATIC_NUMBER_VMS,maxDynamicNodeCount=MAX_DYNAMIC_NUMBER_VMS,startupScript=STARTUP_SCRIPT_NODESET,labels="NODESET_LABEL" ` --add-slurm-partitions=id=PARTITION_NAME,nodesetIds=[NODESET_NAME]cmd.exe
gcloud alpha cluster-director clusters update CLUSTER_NAME ^ --location=REGION ^ --description="DESCRIPTION" ^ --add-labels=CLUSTER_LABEL_KEY=CLUSTER_LABEL_VALUE ^ --add-on-demand-instances=id=NEW_COMPUTE_RESOURCE_NAME,zone=NEW_ZONE,machineType=NEW_MACHINE_TYPE ^ --add-slurm-node-sets=id=NODESET_NAME,computeId=COMPUTE_RESOURCE_NAME,staticNodeCount=STATIC_NUMBER_VMS,maxDynamicNodeCount=MAX_DYNAMIC_NUMBER_VMS,startupScript=STARTUP_SCRIPT_NODESET,labels="NODESET_LABEL" ^ --add-slurm-partitions=id=PARTITION_NAME,nodesetIds=[NODESET_NAME]Replace the following:
CLUSTER_NAME: the name of the cluster that you want to modify.REGION: the region where your cluster exists.DESCRIPTION: a new description for your cluster.CLUSTER_LABEL_KEY: the key for the new label that you want to apply to your cluster. For more information about applying labels to Compute Engine resources, see Organize resources using labels.CLUSTER_LABEL_VALUE: the value for the new label.NEW_COMPUTE_RESOURCE_NAME: the name of the new compute resource configuration.NEW_ZONE: the new zone for the new compute resource configuration.NEW_MACHINE_TYPE: a supported machine type based on your specified consumption option.NODESET_NAME: the name of the new nodeset to add to your cluster.COMPUTE_RESOURCE_NAME: the name of the compute resource configuration for the new nodeset to use.STATIC_NUMBER_VMS: the minimum number of compute instances that your nodeset must use.MAX_DYNAMIC_NUMBER_VMS: the maximum number of compute instances that Cluster Director can add to the nodeset.STARTUP_SCRIPT_NODESET: the startup script for the nodeset.NODESET_LABEL: a label to apply to the nodeset.PARTITION_NAME: the name of the new partition.
The output is similar to the following:
Update request issued for: [cluster000]
Waiting for operation [projects/example-project/locations/us-central1/operations/operation-1759940551889-640a8176bd2a2-0e460b9d-4281a5ca] to complete...working...
Based on the properties that you modify in your cluster, modifying the cluster can take some time to complete. When it does complete, the output is similar to the following:
Updated cluster [cluster000].
REST
To modify a cluster, make a PATCH request to the
clusters.patch method.
Your request must include the following HTTP method and request URL:
PATCH https://hypercomputecluster.googleapis.com/v1/projects/PROJECT_ID/locations/REGION/clusters/CLUSTER_NAME?updateMask=FIELDS_TO_UPDATE
Based on the cluster properties that you want to modify, include one or more of the following fields in the request body:
description: a description for your cluster.labels: key-value pairs of labels to help you organize and filter your clusters, as well as their associated resources.computeResources: the compute resources for your cluster, including the machine types and provisioning models to use for the compute instances in the cluster.storageResources: the storage resources for your cluster; namely, the Filestore instances, Managed Lustre instances, or Cloud Storage buckets.orchestrator: the settings for the Slurm workload scheduler for your cluster, as well as the configurations for the cluster nodesets and partitions.
For example, assume that you want to modify a cluster by modifying the
cluster description, adding a new label, modifying the login nodes, adding a
new nodeset, partition, and compute resource configuration. To make these
changes in your cluster, include the following in a JSON file named
request-body.json:
{
"description": "DESCRIPTION",
"labels": {
"CLUSTER_LABEL_KEY": "CLUSTER_LABEL_VALUE"
},
"orchestrator": {
"slurm": {
"loginNodes": {
"count": "LOGIN_NODE_VMS",
"startupScript": "STARTUP_SCRIPT",
"startupScriptTimeout": "STARTUP_SCRIPT_TIMEOUT"
},
"nodeSets": [
{
"id": "NODESET_NAME",
"computeId": "COMPUTE_RESOURCE_NAME",
"storageConfigs": [
{
"id": "STORAGE_NAME",
"localMount": "STORAGE_PATH"
}
],
"staticNodeCount": "STATIC_NUMBER_VMS",
"maxDynamicNodeCount": "MAX_DYNAMIC_NUMBER_VMS",
"computeInstance": {
"startupScript": "STARTUP_SCRIPT_NODESET",
"startupScriptTimeout": "STARTUP_SCRIPT_TIMEOUT",
"labels": {
"nodesetLabel": "NODESET_LABEL"
},
"bootDisk": {
"type": "projects/PROJECT_ID/zones/DISK_ZONE/diskTypes/DISK_TYPE",
"sizeGb": "DISK_SIZE"
}
}
}
],
"partitions": [
{
"id": "PARTITION_NAME",
"nodeSetIds": [
"NODESET_NAME"
]
}
]
}
},
"computeResources": {
"NEW_COMPUTE_RESOURCE_NAME": {
"config": {
"newOnDemandInstances": {
"machineType": "NEW_MACHINE_TYPE",
"zone": "NEW_ZONE"
}
}
}
}
}
Replace the following:
PROJECT_ID: the ID of your project.REGION: the region where your cluster exists.CLUSTER_NAME: the name of the cluster that you want to modify.FIELDS_TO_UPDATE: a comma-separated list of fields that you want to update. Based on how you want to update a field, do one of the following:To update the field's value, specify the field in the
updateMaskquery parameter and in the request body.To delete a field from the cluster configuration details, specify the field in the
updateMaskquery parameter but omit it from the request body.
For this example, the
updateMaskquery parameter is as follows:description,labels,orchestrator.slurm.loginNodes.count,orchestrator.slurm.loginNodes.startupScript,orchestrator.slurm.nodeSets,orchestrator.slurm.partitions,computeResourcesDESCRIPTION: a new description for your cluster.CLUSTER_LABEL_KEY: the key for the new label that you want to apply to your cluster. For more information about applying labels to Compute Engine resources, see Organize resources using labels.CLUSTER_LABEL_VALUE: the value for the new label.LOGIN_NODE_VMS: the new number of compute instances for the login nodeset.STARTUP_SCRIPT: the startup script for the login nodeset. For more information about using startup scripts in compute instances, see About startup scripts.STARTUP_SCRIPT_TIMEOUT: the maximum time that the startup script can run, for example300s. The default value is300s(300 seconds, or five minutes). The minimum allowed value is1s(one second).NODESET_NAME: the name of the new nodeset to add to your cluster.COMPUTE_RESOURCE_NAME: the name of the compute resource configuration for the new nodeset to use.STORAGE_NAME: the name of the storage resource to mount.STORAGE_PATH: the path on the compute instance where the storage resource is mounted. For example, to mount the storage resource in your home directory, enter/home.STATIC_NUMBER_VMS: the minimum number of compute instances that your nodeset must use.MAX_DYNAMIC_NUMBER_VMS: the maximum number of compute instances that Cluster Director can add to the nodeset.STARTUP_SCRIPT_NODESET: the startup script for the nodeset.NODESET_LABEL: a label to apply to the nodeset.DISK_ZONE: the zone where you want to create the boot disks for the nodesets.DISK_TYPE: the type of boot disk for the nodeset. Based on the machine type in the node, specify one of the following values:For A4X instances:
hyperdisk-balancedFor A4 instances:
hyperdisk-balancedFor A3 Ultra instances:
hyperdisk-balancedFor A3 Mega instances:
pd-balanced,pd-ssd,hyperdisk-balanced,hyperdisk-ml,hyperdisk-extreme, orhyperdisk-throughputFor N2 instances:
pd-standard,pd-balanced,pd-ssd,pd-extreme,hyperdisk-extreme, orhyperdisk-throughput
For an overview of the different types of boot disks that you can use, see Choose a disk type in the Compute Engine documentation.
DISK_SIZE: the size of the boot disk in GB. The value must be50or higher. However, to help ensure that your nodes boot up quickly, we recommend that you specify a value of at least100.PARTITION_NAME: the name of the new partition.NEW_COMPUTE_RESOURCE_NAME: the name of the new compute resource configuration.NEW_MACHINE_TYPE: a supported machine type based on your specified consumption option.NEW_ZONE: the new zone for the new compute resource configuration.
To send your request, select one of the following options:
curl (Bash)
curl -X PATCH \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request-body.json \
"https://hypercomputecluster.googleapis.com/v1/projects/PROJECT_ID/locations/REGION/clusters/CLUSTER_NAME?updateMask=FIELDS_TO_UPDATE"
Powershell
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method PATCH `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request-body.json `
-Uri "https://hypercomputecluster.googleapis.com/v1/projects/PROJECT_ID/locations/REGION/clusters/CLUSTER_NAME?updateMask=FIELDS_TO_UPDATE" | Select-Object -Expand Content
curl (cmd.exe)
curl -X PATCH ^
-H "Authorization: Bearer $(gcloud auth print-access-token)" ^
-H "Content-Type: application/json; charset=utf-8" ^
-d @request-body.json ^
"https://hypercomputecluster.googleapis.com/v1/projects/PROJECT_ID/locations/REGION/clusters/CLUSTER_NAME?updateMask=FIELDS_TO_UPDATE"
The response is similar to the following:
{
"name": "projects/example-project/locations/us-central1/operations/operation-1758842430697-63fa86a4c3030-028b6436-2fbda8e1",
"metadata": {
"@type": "type.googleapis.com/google.cloud.hypercomputecluster.v1.OperationMetadata",
"createTime": "2025-09-25T23:20:30.707315354Z",
"target": "projects/example-project/locations/us-central1/clusters/clusterp6a",
"verb": "update",
"requestedCancellation": false,
"apiVersion": "v1"
},
"done": false
}
Based on the properties that you modify in your cluster, modifying the cluster can take some time to complete.