Create a dynamic nodeset for a Slurm partition

Use the schedmd-slurm-gcp-v6-nodeset-dynamic module to create an instance template and a nodeset data structure for dynamic compute nodes. This module works in conjunction with the schedmd-slurm-gcp-v6-partition module to let your Slurm cluster provision resources on demand.

For the complete list of inputs and outputs, see the slurm-nodeset-dynamic module in the Cluster Toolkit GitHub repository.

Before you begin

Before you begin, verify that you meet the following requirements:

  • You have installed and configured Cluster Toolkit. For installation instructions, see Set up Cluster Toolkit.
  • You have an existing cluster blueprint. You can use and modify an existing blueprint or create one from scratch. For a working example of a blueprint configured for the slurm-nodeset-dynamic module, see the examples/hpc-slurm.yaml file. For more information about creating and customizing blueprints, see Cluster blueprint.
  • To view a complete list of blueprints that support the slurm-nodeset-dynamic module, go to the Cluster blueprint catalog page, click the Select scheduler menu and then select Slurm.

Required roles

To get the permissions that you need to deploy the dynamic nodeset, ask your administrator to grant you the following IAM roles on your project:

For more information about granting roles, see Manage access to projects, folders, and organizations.

You might also be able to get the required permissions through custom roles or other predefined roles.

Create a dynamic nodeset

To create a dynamic nodeset, add the schedmd-slurm-gcp-v6-nodeset-dynamic module to your blueprint. You must specify the machine type and connect it to a partition and controller.

The following example creates a dynamic nodeset and uses it to configure a partition. It also defines a managed instance group (MIG) that uses the instance template created by the dynamic nodeset module:

  - id: dynamic_ns
    source: community/modules/compute/schedmd-slurm-gcp-v6-nodeset-dynamic
    use: [network, controller]
    settings:
      machine_type: n2-standard-2

  - id: dynamic_partition
    source: community/modules/compute/schedmd-slurm-gcp-v6-partition
    use: [dynamic_ns]
    settings:
      partition_name: mp
      is_default: true

  - id: controller
    source: community/modules/scheduler/schedmd-slurm-gcp-v6-controller
    use: [network, dynamic_partition]

  - id: mig
    source: community/modules/compute/mig
    settings:
      versions:
      - name: v1
        instance_template: $(dynamic_ns.instance_template_self_link)
      base_instance_name: $(dynamic_ns.node_name_prefix)

Configure custom images

For information about how to create valid custom images for the node group virtual machine (VM) instances or for custom instance templates, see the documentation on Slurm on Google Cloud custom images on GitHub.

Configure GPU support

To learn more about GPU support in slurm-nodeset-dynamic and other Cluster Toolkit modules, see the documentation on GPU support on GitHub.

What's next