Create a custom cluster

This document explains how to create a cluster in Cluster Director where you fully customize the compute, networking, and storage resources for your specific artificial intelligence (AI), machine learning (ML), or high performance computing (HPC) workloads.

This process lets you design a fault-tolerant and highly scalable Slurm environment with your own custom specifications, helping your cluster meet the needs of your workloads. To create a cluster based on a template that is optimized for running AI and ML workloads, see instead Create an AI-optimized cluster based on a template.

Limitations

When you create a cluster in Cluster Director, the following limitations apply:

Regional scope: clusters are regional resources. You can only create or use compute resources, storage resources, and subnetworks that exist within the same region as your cluster.
Compute resource configuration per nodeset: you can only assign one compute resource configuration for each nodeset that you want to create in your cluster.
Storage classes for new Cloud Storage buckets: if you plan to create one or more buckets when creating a cluster, then you can only specify the Standard storage class or Autoclass. If you want to use other classes, then you must update the bucket after you create the cluster.

Before you begin

Before you create a cluster in Cluster Director, do the following:

Choose consumption options. If you haven't already, then you must choose the consumption options for the virtual machine (VM) instances that you want to use in each partition for your cluster. Each consumption option determines the availability, obtainability, and pricing for your VMs.

To learn more, see Choose a consumption option.
Obtain capacity and quota. Based on your chosen consumption option, review the quota requirements for the VMs that you want to create in the cluster. If you lack sufficient quota, then creating your cluster fails.

To learn more, see Capacity and quota overview.
Verify usable reservations. If you want to create your cluster by using one or more reservations, then verify that the reservations have enough available resources to create your chosen number of VMs in the cluster. Otherwise, skip this step.

To learn more, see Consumable VMs in a reservation.
Verify trusted image policy. If the organization in which your project exists has a trusted image policy (constraints/compute.trustedImageProjects), then verify that the clusterdirector-public-images project is included in the list of allowed projects.

To learn more, see Setting up trusted image policies.
Verify existing resource requirements. If you plan to use existing storage or networking resources in your cluster instead of creating new ones, then you must verify that those resources are correctly configured. Otherwise, skip this step.

To learn more, see Cluster creation process overview.
Authenticate. To use the samples on this page, you might need to authenticate to Google.

Select the tab for how you plan to use the samples on this page:
Console

When you use the Google Cloud console to access Google Cloud services and APIs, you don't need to set up authentication.

gcloud

In the Google Cloud console, activate Cloud Shell.

Activate Cloud Shell

At the bottom of the Google Cloud console, a Cloud Shell session starts and displays a command-line prompt. Cloud Shell is a shell environment with the Google Cloud CLI already installed and with values already set for your current project. It can take a few seconds for the session to initialize.
REST

To use the REST API samples on this page in a local development environment, you use the credentials you provide to the gcloud CLI.
For more information, see Authenticate for using REST in the Google Cloud authentication documentation.

Required roles

To get the permissions that you need to create a custom cluster from scratch, ask your administrator to grant you the following IAM roles:

To create and manage the cluster: Cluster Director Editor (roles/hypercomputecluster.editor) on your project
To manage resources in the cluster:
- Compute Instance Admin (v1) (roles/compute.instanceAdmin.v1) on the Compute Engine default service account
- Logs Writer (roles/logging.logWriter) on the Compute Engine default service account
- Monitoring Metric Writer (roles/monitoring.metricWriter) on the Compute Engine default service account
- Storage Object Viewer (roles/storage.objectViewer) on the Compute Engine default service account

For more information about granting roles, see Manage access to projects, folders, and organizations.

These predefined roles contain the permissions required to create a custom cluster from scratch. To see the exact permissions that are required, expand the Required permissions section:

Required permissions

The following permissions are required to create a custom cluster from scratch:

To create a cluster: hypercomputecluster.clusters.create

You might also be able to get these permissions with custom roles or other predefined roles.

Create a custom cluster from scratch

To create a custom cluster from scratch, select one of the following options:

Console

In the Google Cloud console, go to the Cluster Director page.

Go to Cluster Director
Click Create cluster.
In dialog that appears, click Step-by-step configuration. The Create cluster page appears.
In the Cluster name field, enter a name for your cluster. The name can contain up to 10 characters and must only use lowercase letters.
In the Compute section, click Configure resources. The Add resource configuration pane appears.
To configure a compute resource configuration, complete the following steps:
1. In the Machine configuration section, select the machine series and type that you want to use.
2. In the Number of instances field, enter the number of VMs to use for the configuration.
3. In the Consumption options section, specify the consumption option that you want to use to obtain resources:
  - To create GPU VMs by using a reservation, do the following:
    1. Click the Use reservation tab.
    2. Click Select reservation. The Choose a reservation pane appears.
    3. Select the reservation that you want to use. Then, click Choose. This action automatically sets the Region and Zone fields to the region and zone of your reservation.
  - To create GPU Flex-start VMs, do the following:
    1. Click the Flex start tab.
    2. In the Time limit for the VM section, specify the run duration for the VMs. The value must be between 10 minutes and 7 days.
    3. In the Location section, select the region where you want to create Flex-start VMs. The Google Cloud console automatically filters the available regions to only show only those regions that support Flex-start VMs for your selected machine type.
  - To create GPU or N2 Spot VMs, do the following:
    1. Click the Use spot tab.
    2. In the On VM termination list, select one of the following options:
      - To delete Spot VMs on preemption, select Delete.
      - To stop Spot VMs on preemption, select Stop.
    3. In the Location section, select the Region and Zone where you want to create Spot VMs. The Google Cloud console automatically filters the available regions to only show only those regions that support Spot VMs for your selected machine type.
  - To create N2 VMs, do the following:
    1. Click the Use on-demand tab.
    2. In the Location section, select the region where you want to create VMs.
4. Click Done.
5. Optional: To create additional compute resource configurations, click the Add resource configuration, and then follow the prompts to specify the compute resources.
Click Continue.
In the Choose new or existing network section, do one of the following:
- Recommended: To let Cluster Director automatically create a pre-configured network for your cluster, do the following:
  1. Select Create network.
  2. In the Cluster name field, enter the name of the network.
- To use an existing network, do the following:
  1. Select Select existing network.
  2. In the Select VPC network list, select an existing network.
  3. In the Select subnetwork list, select an existing subnetwork.
Click Continue.
Optional: To edit a storage resource, in the Storage section, click the Edit storage plan, and then do one of the following:
- To specify a Filestore instance, do the following:
  1. Click the Filestore tab.
  2. In the Instance provisioning section, select one of the following options:
    - To use an existing Filestore instance that uses the same network as your cluster, select Select existing Filestore instance, and then select the instance.
    - To create a new Filestore instance, select Create new instance. Then, follow the prompts to create your instance. For more information about the configurations that you can specify in the instance, see Create an instance.
- To specify a Google Cloud Managed Lustre instance, do the following:
  1. Click the Managed Lustre tab.
  2. In the Instance provisioning section, select one of the following options:
    - To use an existing Managed Lustre instance, select Select an existing Managed Lustre instance, and then select the instance.
    - To create a new Managed Lustre instance, select Create new instance. Then, follow the prompts to create your instance. For more information about the configurations that you can specify in the instance, see Create a Managed Lustre instance.
- To specify a Cloud Storage bucket, do the following:
  1. Click the Cloud Storage tab.
  2. In the Bucket provisioning section, select one of the following options:
    - To use an existing Cloud Storage bucket, select Select an existing bucket, and then select the bucket.
    - To create a new Cloud Storage bucket, select Create a new bucket, and then follow the prompts to create your bucket. For more information about the configurations that you can specify in the bucket, see Create a bucket.
Optional: To add storage resources to your cluster, click Add storage configuration, and then follow the prompts to specify the configuration for the storage resource.
Click Continue.
Optional: To edit the number and type of VMs that the login node uses, expand the Login node section, and then complete the following steps:
1. In the Machine type field, select an N2 standard machine type of 32 vCPUs or fewer.
2. Optional: To specify a custom OS image instead of the one that Cluster Director automatically configures for the login node, in the Source image field, select one of the supported OS images for Cluster Director.
In the Node count field, enter the number of VMs to use in the login node.
Optional: To specify a startup script for the VMs, in the Startup script field, enter your script. For more information about this type of scripts, see About startup scripts.
In the Boot disk type and Boot disk size fields, select a boot disk type and size for the VMs in the login node. For more information about the boot disks that your VMs can use, see Choose a disk type.
Optional: To specify more advanced configurations for the login node, expand Advanced login node settings. Then, follow the prompts to manage OS login, manage public IPs, or add or remove labels to the VMs in the login node.
Optional: To edit the partitions of your cluster to organize your compute resources, expand the Partitions section, and then do one of the following:
- To add a partition, click Add partition, and then do the following:
  1. In the Partition name field, enter a name for the partition.
  2. To edit a nodeset, click Toggle nodeset. To add a nodeset, click Add nodeset.
  3. In the Nodeset name field, enter a name for your nodeset.
  4. In the Resource configuration field, select a compute resource configuration that you created in the previous steps.
  5. In the Source image field, select an OS image to use for the compute nodes in the nodeset.
  6. In the Static node count field, enter the minimum number of VMs that must always be running in the cluster.
  7. In the Dynamic node count field, enter a limit of VMs that Cluster Director can increase the cluster to during increases in traffic.
    
    Important: If you create VMs in the nodeset by using a reservation, especially a shared reservation, then verify that the reservation has enough resources available to create your specified maximum number of VMs. Other workloads that use the same reservation might fully use it and, thus, Cluster Director might be unable to create more VMs in your nodeset.
  8. In the Boot disk type list and Boot disk size field, enter the type and size of the boot disk for the VMs to use.
  9. In the Boot disk type and Boot disk size fields, select a boot disk type and size for the VMs in the compute node.
  10. Optional: To specify more advanced configurations for the compute node, expand Advanced nodeset settings. Then, follow the prompts to add or remove startup scripts, or add or remove labels.
  11. Click Done.
- To remove a partition, click Delete partition.
Optional: To add a partition to your cluster, click Add partition, and then follow the prompts to specify the compute resources for the partition.
Optional: To add prolog or epilog scripts to your Slurm cluster, do the following:
1. Expand the Advanced orchestration settings section.
2. In the Scripts section, follow the prompts to add prolog or epilog scripts.
Click Create.

The Clusters page appears. Creating the cluster can take some time to complete. The completion time depends on the number of VMs that you request and resource availability in the VMs' zone. If your requested resources are unavailable, then Cluster Director maintains the creation request until resources become available. To view the status of the cluster create operation, view your cluster's details.

When Cluster Director creates your login node, the cluster state changes to Ready. You can then connect to your cluster; however, you can run workloads only after Cluster Director creates the compute nodes in the cluster.

gcloud

To create a cluster from scratch, use the gcloud alpha cluster-director clusters create command.

Based on how you want to specify the cluster configuration, use one of the following methods:

Specify a configuration file: to create a cluster by specifying the cluster configuration in a JSON file, use the --config flag. To run the command, select one of the following options:
Bash
```
gcloud alpha cluster-director clusters create CLUSTER_NAME \
    --location=REGION \
    --config=CONFIGURATION_FILE
```
Powershell
```
gcloud alpha cluster-director clusters create CLUSTER_NAME `
    --location=REGION `
    --config=CONFIGURATION_FILE
```
cmd.exe
```
gcloud alpha cluster-director clusters create CLUSTER_NAME ^
    --location=REGION ^
    --config=CONFIGURATION_FILE
```
Replace the following:
- CLUSTER_NAME: the name of the cluster. The name can contain up to 10 characters, and it can only use numbers or lowercase letters.
- REGION: the region where to create your cluster.
- CONFIGURATION_FILE: the path to the JSON file that contains the configuration details for the cluster. To review the configuration details that you can specify, review the request body for creating a cluster by using REST.
Specify cluster properties directly: to create a cluster by specifying each configuration property directly, use the following flags:
- To specify a network, use one of the following flags:
  - To create a new network: --create-network
  - To use an existing network and subnetwork: --network and --subnet
- To specify a Filestore instance, use one of the following flags:
  - To create a new instance: --create-filestores
  - To use an existing instance: --filestores
- Optionally, to specify a Cloud Storage bucket, use one of the following flags:
  - To create a new bucket: --create-buckets
  - To use an existing bucket: --buckets
- Optionally, to specify a Google Cloud Managed Lustre instance, use one of the following flags:
  - To create a new instance: --create-lustres
  - To use an existing instance: --lustres
- To specify a compute resource configuration, use one of the following flags for each resource configuration that you want to create in the cluster.
  - To create VMs by using a reservation: --reserved-instances
  - To create Flex-start VMs: --dws-flex-instances
  - To create Spot VMs: --spot-instances
  - To create N2 on-demand VMs: --on-demand-instances
- To specify the configuration for the login node, use the --slurm-login-node flag.
- To specify the configuration for a compute nodeset, use the --slurm-node-sets flag. You can repeat this flag for each nodeset in the cluster.
- To specify the cluster partitions, use the --slurm-partitions flag. You can repeat this flag for each partition in the cluster.
- To specify the default partition for the cluster, use the --slurm-default-partition flag.
For example, assume that you want to create a cluster with one partition that uses reserved VMs, one partition that uses Spot VMs, a new Filestore instance, and a new network. To create the example cluster, select one of the following options:
Bash
```
gcloud alpha cluster-director clusters create CLUSTER_NAME \
    --location=REGION \
    --create-network=name=NETWORK_NAME \
    --create-filestores=name="locations/FILESTORE_INSTANCE_ZONE/instances/FILESTORE_INSTANCE_NAME",tier=TIER,capacityGb=CAPACITY,fileshare=SHARE_NAME,protocol=PROTOCOL \
    --reserved-instances=id=COMPUTE_RESOURCE_NAME_1,reservation="projects/RESERVATION_PROJECT_ID/zones/RESERVATION_ZONE/reservations/RESERVATION_NAME",machineType=RESERVATION_MACHINE_TYPE \
    --spot-instances=id=COMPUTE_RESOURCE_NAME_2,zone=SPOT_VMS_ZONE,machineType=SPOT_MACHINE_TYPE \
    --slurm-login-node=machineType=LOGIN_NODE_MACHINE_TYPE,zone=LOGIN_NODE_ZONE,count=LOGIN_NODES_COUNT \
    --slurm-node-sets=id=NODESET_NAME_1,computeId=COMPUTE_RESOURCE_NAME_1,staticNodeCount=NODESET_1_STATIC_COUNT,maxDynamicNodeCount=NODESET_1_MAX_DYNAMIC_COUNT \
    --slurm-node-sets=id=NODESET_NAME_2,computeId=COMPUTE_RESOURCE_NAME_2,staticNodeCount=NODESET_2_STATIC_COUNT,maxDynamicNodeCount=NODESET_2_MAX_DYNAMIC_COUNT \
    --slurm-partitions=id=PARTITION_NAME_1,nodesetIds=[NODESET_NAME_1] \
    --slurm-partitions=id=PARTITION_NAME_2,nodesetIds=[NODESET_NAME_2] \
    --slurm-default-partition=PARTITION_NAME_1
```
Powershell
```
gcloud alpha cluster-director clusters create CLUSTER_NAME `
    --location=REGION `
    --create-network=name=NETWORK_NAME `
    --create-filestores=name="locations/FILESTORE_INSTANCE_ZONE/instances/FILESTORE_INSTANCE_NAME",tier=TIER,capacityGb=CAPACITY,fileshare=SHARE_NAME,protocol=PROTOCOL `
    --reserved-instances=id=COMPUTE_RESOURCE_NAME_1,reservation="projects/RESERVATION_PROJECT_ID/zones/RESERVATION_ZONE/reservations/RESERVATION_NAME" `
    --spot-instances=id=COMPUTE_RESOURCE_NAME_2,zone=SPOT_VMS_ZONE,machineType=SPOT_MACHINE_TYPE `
    --slurm-login-node=machineType=LOGIN_NODE_MACHINE_TYPE,zone=LOGIN_NODE_ZONE,count=LOGIN_NODES_COUNT `
    --slurm-node-sets=id=NODESET_NAME_1,computeId=COMPUTE_RESOURCE_NAME_1,staticNodeCount=NODESET_1_STATIC_COUNT,maxDynamicNodeCount=NODESET_1_MAX_DYNAMIC_COUNT `
    --slurm-node-sets=id=NODESET_NAME_2,computeId=COMPUTE_RESOURCE_NAME_2,staticNodeCount=NODESET_2_STATIC_COUNT,maxDynamicNodeCount=NODESET_2_MAX_DYNAMIC_COUNT `
    --slurm-partitions=id=PARTITION_NAME_1,nodesetIds=[NODESET_NAME_1] `
    --slurm-partitions=id=PARTITION_NAME_2,nodesetIds=[NODESET_NAME_2] `
    --slurm-default-partition=PARTITION_NAME_1
```
cmd.exe
```
gcloud alpha cluster-director clusters create CLUSTER_NAME ^
    --location=REGION ^
    --create-network=name=NETWORK_NAME ^
    --create-filestores=name="locations/FILESTORE_INSTANCE_ZONE/instances/FILESTORE_INSTANCE_NAME",tier=TIER,capacityGb=CAPACITY,fileshare=SHARE_NAME,protocol=PROTOCOL ^
    --reserved-instances=id=COMPUTE_RESOURCE_NAME_1,reservation="projects/RESERVATION_PROJECT_ID/zones/RESERVATION_ZONE/reservations/RESERVATION_NAME" ^
    --spot-instances=id=COMPUTE_RESOURCE_NAME_2,zone=SPOT_VMS_ZONE,machineType=SPOT_MACHINE_TYPE ^
    --slurm-login-node=machineType=LOGIN_NODE_MACHINE_TYPE,zone=LOGIN_NODE_ZONE,count=LOGIN_NODES_COUNT ^
    --slurm-node-sets=id=NODESET_NAME_1,computeId=COMPUTE_RESOURCE_NAME_1,staticNodeCount=NODESET_1_STATIC_COUNT,maxDynamicNodeCount=NODESET_1_MAX_DYNAMIC_COUNT ^
    --slurm-node-sets=id=NODESET_NAME_2,computeId=COMPUTE_RESOURCE_NAME_2,staticNodeCount=NODESET_2_STATIC_COUNT,maxDynamicNodeCount=NODESET_2_MAX_DYNAMIC_COUNT ^
    --slurm-partitions=id=PARTITION_NAME_1,nodesetIds=[NODESET_NAME_1] ^
    --slurm-partitions=id=PARTITION_NAME_2,nodesetIds=[NODESET_NAME_2] ^
    --slurm-default-partition=PARTITION_NAME_1
```
Replace the following:
- CLUSTER_NAME: the name of the cluster. The name can contain up to 10 characters, and it can only use numbers or lowercase letters (a-z). Spaces or special characters aren't allowed.
- REGION: the region where to create your cluster.
- NETWORK_NAME: the name of the network that you want to create.
- FILESTORE_INSTANCE_ZONE: the zone where you want to create your Filestore instance.
- FILESTORE_INSTANCE_NAME: the name for your Filestore instance.
- TIER: the type of service tier that you want to use for the instance and that Cluster Director supports. Specify one of the following values:
  - For the zonal tier: ZONAL
  - For the regional tier: REGIONAL
- CAPACITY: the size, in GiB, that you want to allocate for the instance. The value must be between 1,024 GiB (1024) and 102,400 GiB (102400), and it must be in 256 GiB (256) increments.
- SHARE_NAME: the name for the NFS file share that is served from the instance.
- PROTOCOL: the system protocol for the instance. Specify one of the following values:
  - For NFSv3: NFSV3
  - For NFSv4.1: NFSV41
- COMPUTE_RESOURCE_NAME_1 and COMPUTE_RESOURCE_NAME_2: the name of the two compute resource configurations.
- RESERVATION_PROJECT_ID: the ID of the project where the reservation exists. If you want to use a reservation from a different project, then verify that your project is allowed to consume the reservation. For more information, see Allow and restrict projects from creating and modifying shared reservations.
- RESERVATION_ZONE: the zone where the reservation exists.
- RESERVATION_NAME: the name of the reservation that you want to use to create VMs.
- RESERVATION_MACHINE_TYPE: the machine type that is specified in the reservation.
- SPOT_VMS_ZONE: the zone where you want to create your Spot VMs. To review the regions and zones where the machine type that you want to use is available, see Available regions and zones.
- SPOT_MACHINE_TYPE: the machine type to use for the Spot VMs. Specify one of the following machine types:
  - For an A4 machine type: a4-highgpu-8g
  - For an A3 Ultra machine type: a3-ultragpu-8g
  - For an A3 Mega machine type: a3-megagpu-8g
  - For an N2 machine type, see N2 machine series.
- LOGIN_NODE_MACHINE_TYPE: the machine type that you want the VMs in the login nodeset to use. Specify an N2 standard machine type with 32 or fewer vCPUs.
- LOGIN_NODE_ZONE: the zone where you want to create the VMs in the login nodeset.
- LOGIN_NODES_COUNT: the number of VMs to use for the login nodeset.
- NODESET_NAME_1 and NODESET_NAME_2: the name of the two nodesets.
- NODESET_1_STATIC_COUNT and NODESET_2_STATIC_COUNT: the minimum number of VMs that must always be running in each nodeset.
- NODESET_1_MAX_DYNAMIC_COUNT and NODESET_2_MAX_DYNAMIC_COUNT: the maximum number of VMs that Cluster Director can add to each nodeset during increases in traffic.
  
  Important: If you create VMs in the nodeset by using a reservation, especially a shared reservation, then verify that the reservation has enough resources available to create your specified maximum number of VMs. Other workloads that use the same reservation might fully use it and, thus, Cluster Director might be unable to create more VMs in your nodeset.
- PARTITION_NAME_1 and PARTITION_NAME_2: the name of the partitions for your cluster.

The output is similar to the following:

Create request issued for: [cluster000]
Waiting for operation [projects/example-project/locations/us-central1/operations/operation-1759856594716-640948b2f058e-f403bef9-1a08178a] to complete...working...

Creating the cluster can take some time to complete. The completion time depends on the number of VMs that you request and resource availability in the VMs' zone. If your requested resources are unavailable, then Cluster Director maintains the creation request until resources become available. When Cluster Director creates your login node, the output is similar to the following. You can then connect to your cluster; however, you can only run workloads when Cluster Director creates the compute nodes in your cluster.

Created cluster [cluster000].

REST

To create a cluster from scratch, make a POST request to the clusters.create method.

Your request must include the following HTTP method and request URL:

POST https://hypercomputecluster.googleapis.com/v1/projects/PROJECT_ID/locations/REGION/clusters?clusterId=CLUSTER_NAME

In the request body, include the following fields:

description: Optional. A description for your cluster.
labels: Optional. Key-value pairs of labels to help you organize and filter your clusters and its associated resources. For more information, see Organize resources using labels.
networkResources: the network configuration for your cluster. You can either create a new network or use an existing one.
storageResources: the storage resources for your cluster. You can either create Filestore instances, Managed Lustre instances, or Cloud Storage buckets, or use existing ones.
computeResources: the compute resources for your cluster, including the machine types and provisioning models to use for the VMs in the cluster.
orchestrator: the settings for the Slurm workload scheduler for your cluster, as well as the configurations for the cluster nodesets and partitions.

For example, assume that you want to create a cluster with one partition that uses reserved VMs, one partition that uses Spot VMs, a new Filestore instance, and a new network. To create the example cluster, include the following in a JSON file named request-body.json:

{
  "name": "CLUSTER_NAME",
  "networkResources": {
    "NETWORK_NAME": {
      "config": {
        "newNetwork": {
          "network": "projects/PROJECT_ID/global/networks/NETWORK_NAME"
        }
      }
    }
  },
  "storageResources": {
    "STORAGE_RESOURCE_CONFIGURATION": {
      "config": {
        "newFilestore": {
          "filestore": "projects/PROJECT_ID/locations/FILESTORE_INSTANCE_ZONE/instances/FILESTORE_INSTANCE_NAME",
          "fileShares": {
            "capacityGb": "CAPACITY",
            "fileShare": "SHARE_NAME"
          },
          "tier": "TIER",
          "protocol": "PROTOCOL"
        }
      }
    }
  },
  "computeResources": {
    "COMPUTE_RESOURCE_NAME_1": {
      "config": {
        "newReservedInstances": {
          "reservation": "projects/RESERVATION_PROJECT_ID/zones/RESERVATION_ZONE/reservations/RESERVATION_NAME"
        }
      }
    },
    "COMPUTE_RESOURCE_NAME_2": {
      "config": {
        "newSpotInstances": {
          "zone": "SPOT_VMS_ZONE",
          "machineType": "SPOT_MACHINE_TYPE"
        }
      }
    }
  },
  "orchestrator": {
    "slurm": {
      "loginNodes": {
        "count": "LOGIN_NODES_COUNT",
        "zone": "LOGIN_NODE_ZONE",
        "machineType": "LOGIN_NODE_MACHINE_TYPE"
      },
      "nodeSets": [
        {
          "id": "NODESET_NAME_1",
          "computeId": "COMPUTE_RESOURCE_NAME_1",
          "storageConfigs": [
            {
              "id": "STORAGE_RESOURCE_CONFIGURATION",
              "localMount": "/home"
            }
          ],
          "staticNodeCount": "NODESET_1_STATIC_COUNT",
          "maxDynamicNodeCount": "NODESET_1_MAX_DYNAMIC_COUNT",
          "computeInstance": {
            "bootDisk": {
              "type": "projects/PROJECT_ID/zones/DISK_ZONE_1/diskTypes/DISK_TYPE_1",
              "sizeGb": "DISK_SIZE_1"
            }
          }
        },
        {
          "id": "NODESET_NAME_2",
          "computeId": "COMPUTE_RESOURCE_NAME_2",
          "storageConfigs": [
            {
              "id": "STORAGE_RESOURCE_CONFIGURATION",
              "localMount": "/home"
            }
          ],
          "staticNodeCount": "NODESET_2_STATIC_COUNT",
          "maxDynamicNodeCount": "NODESET_2_MAX_DYNAMIC_COUNT",
          "computeInstance": {
            "bootDisk": {
              "type": "projects/PROJECT_ID/zones/DISK_ZONE_2/diskTypes/DISK_TYPE_2",
              "sizeGb": "DISK_SIZE_2"
            }
          }
        }
      ],
      "partitions": [
        {
          "id": "PARTITION_NAME_1",
          "nodeSetIds": [
            "NODESET_NAME_1"
          ]
        },
        {
          "id": "PARTITION_NAME_2",
          "nodeSetIds": [
            "NODESET_NAME_2"
          ]
        }
      ],
      "defaultPartition": "PARTITION_NAME_1"
    }
  }
}

Replace the following:

PROJECT_ID: the ID of the project where you want to create your cluster and its associated resources.
REGION: the region where to create your cluster.
CLUSTER_NAME: the name of the cluster. The name can contain up to 10 characters, and it can only use numbers or lowercase letters (a-z).
NETWORK_NAME: the name of the network that you want to create.
STORAGE_RESOURCE_CONFIGURATION: the name of the storage resource configuration.
FILESTORE_INSTANCE_ZONE: the zone where you want to create your Filestore instance.
FILESTORE_INSTANCE_NAME: the name for your Filestore instance.
CAPACITY: the size, in GiB, that you want to allocate for the instance. The value must be between 1,024 GiB (1024) and 102,400 GiB (102400), and it must be in 256 GiB (256) increments. For more information about the supported service tiers and capacity for Filestore instances, see Service tiers.
SHARE_NAME: the name for the NFS file share that is served from the instance.
TIER: the type of service tier that you want to use for the instance and that Cluster Director supports. Specify one of the following values:
- For the zonal tier: ZONAL
- For the regional tier: REGIONAL
PROTOCOL: the system protocol for the instance. Specify one of the following values:
- For NFSv3: NFSV3
- For NFSv4.1: NFSV41
COMPUTE_RESOURCE_NAME_1 and COMPUTE_RESOURCE_NAME_2: the name of the two compute resource configurations.
RESERVATION_PROJECT_ID: the ID of the project where the reservation exists. If you want to use a reservation from a different project, then verify that your project is allowed to consume the reservation. For more information, see Allow and restrict projects from creating and modifying shared reservations.
RESERVATION_NAME: the name of the reservation that you want to use to create VMs.
SPOT_VMS_ZONE: the zone where you want to create your Spot VMs. To review the regions and zones where the machine type that you want to use is available, see Available regions and zones.
SPOT_MACHINE_TYPE: the machine type to use for the Spot VMs. Specify one of the following machine types:
- For an A4 machine type: a4-highgpu-8g
- For an A3 Ultra machine type: a3-ultragpu-8g
- For an A3 Mega machine type: a3-megagpu-8g
- For an N2 machine type, see N2 machine series.
LOGIN_NODES_COUNT: the number of VMs to use for the login nodeset.
LOGIN_NODE_ZONE: the zone where you want to create the VMs in the login nodeset.
LOGIN_NODE_MACHINE_TYPE: the machine type that you want the VMs in the login nodeset to use. Specify an N2 standard machine type with 32 or fewer vCPUs.
NODESET_NAME_1 and NODESET_NAME_2: the name of the two nodesets.
NODESET_1_STATIC_COUNT and NODESET_2_STATIC_COUNT: the minimum number of VMs that must always be running in each nodeset.
NODESET_1_MAX_DYNAMIC_COUNT and NODESET_2_MAX_DYNAMIC_COUNT: the maximum number of VMs that Cluster Director can add to each nodeset during increases in traffic.

Important: If you create VMs in the nodeset by using a reservation, especially a shared reservation, then verify that the reservation has enough resources available to create your specified maximum number of VMs. Other workloads that use the same reservation might fully use it and, thus, Cluster Director might be unable to create more VMs in your nodeset.
DISK_ZONE_1 and DISK_ZONE_2: the zone where you want to create the boot disks for the nodesets.
DISK_TYPE_1 and DISK_TYPE_2: the type of boot disks for the nodesets. Based on the machine type in the node, specify one of the following values:
- For A4X VMs: hyperdisk-balanced or hyperdisk-extreme
- For A4 VMs: hyperdisk-balanced or hyperdisk-extreme
- For A3 Ultra VMs: hyperdisk-balanced, hyperdisk-extreme, or hyperdisk-ml
- For A3 Mega VMs: hyperdisk-balanced, hyperdisk-extreme, or hyperdisk-ml
- For N2 VMs: pd-standard
For an overview of the different types of boot disks that you can use, see Choose a disk type.
DISK_SIZE_1 and DISK_SIZE_2: the size of the boot disks for the two compute nodes in GB. The value must be 10 or higher.
PARTITION_NAME_1 and PARTITION_NAME_2: the name of the partitions for your cluster.

To send your request, select one of the following options:

curl (Bash)

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request-body.json \
     "https://hypercomputecluster.googleapis.com/v1/projects/PROJECT_ID/locations/REGION/clusters?clusterId=CLUSTER_NAME"

Powershell

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request-body.json `
    -Uri "https://hypercomputecluster.googleapis.com/v1/projects/PROJECT_ID/locations/REGION/clusters?clusterId=CLUSTER_NAME" | Select-Object -Expand Content

curl (cmd.exe)

curl -X POST ^
     -H "Authorization: Bearer $(gcloud auth print-access-token)" ^
     -H "Content-Type: application/json; charset=utf-8" ^
     -d @request-body.json ^
     "https://hypercomputecluster.googleapis.com/v1/projects/PROJECT_ID/locations/REGION/clusters?clusterId=CLUSTER_NAME"

The response is similar to the following:

{
  "name": "projects/example-project/locations/us-central1/operations/operation-1758842430697-63fa86a4c3030-028b6436-2fbda8e1",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.hypercomputecluster.v1.OperationMetadata",
    "createTime": "2025-09-25T23:20:30.707315354Z",
    "target": "projects/example-project/locations/us-central1/clusters/clusterp6a",
    "verb": "update",
    "requestedCancellation": false,
    "apiVersion": "v1"
  },
  "done": false
}

Creating the cluster can take some time to complete. The completion time depends on the number of VMs that you request and resource availability in the VMs' zone. If your requested resources are unavailable, then Cluster Director maintains the creation request until resources become available. When Cluster Director creates your login node, you can connect to your cluster. However, you can run workloads only after Cluster Director creates the compute nodes in your cluster.

Create a custom cluster Stay organized with collections Save and categorize content based on your preferences.

Limitations

Before you begin

Console

gcloud

REST

Required roles

Required permissions

Create a custom cluster from scratch

Console

gcloud

Bash

Powershell

cmd.exe

Bash

Powershell

cmd.exe

REST

curl (Bash)

Powershell

curl (cmd.exe)

What's next?

Create a custom cluster