Configure Slurm

Before you run jobs on A4X virtual machine (VM) instances or Flex-start VMs in your cluster, you must configure your jobs to run with Slurm. This setup helps you achieve the following:

  • Your jobs have the resources that they need to run.
  • You prevent unnecessary costs by scheduling jobs on all VMs in your cluster. This document explains how to configure jobs in Slurm for A4X VMs or Flex-start VMs. For a high-level overview of how Slurm orchestrates jobs in Cluster Director, see Slurm orchestration in Cluster Director.

Configure VMs for Slurm

The following sections describe the required Slurm configuration for A4 VMs and Flex-start VMs.

Configure A4X VMs for Slurm

For cluster partitions that only use A4X VMs, Slurm uses block topology to align job scheduling with the physical hardware structure of A4X machines; specifically, NVLink domains. This configuration minimizes network latency by ensuring that tasks run on VMs that are physically close to each other. When you submit a job in Slurm, you can control how the job interacts with blocks of A4X VMs by using the following flags:

  • --segment=SEGMENT_SIZE: this flag specifies to group nodes into segments of a specific size. This configuration lets Slurm fit your job into the available capacity by bypassing nodes that are drained or unavailable. The value must be between 1 and 18. If your job can't start because there aren't enough adjacent nodes to match your segment size, then we recommend using a value of 1.
  • --exclusive=topo: this flag specifies to reserve an entire sub-block for a job. This isolation helps ensure that no other jobs share the NVLink domain, preventing interference.

Configure Flex-start VMs for Slurm

For cluster partitions that use Flex-start VMs, Slurm interacts with nodes as follows:

  • Automatic reprovisioning of static nodes: after the maximum run duration for your Flex-start VMs ends, Cluster Director automatically requests to create new Flex-start VMs for the static nodes in your partition. This process helps you automatically obtain resources for your jobs.
  • Capacity queuing: when you submit jobs or when Cluster Director request resources for static nodes in a partition, Cluster Director attempts to create the requested VMs. If capacity is unavailable, then Cluster Director maintains the request for up to eight hours. After that time, if resources are still unavailable, Slurm sets the node state to DOWN. For static nodes, Cluster Director automatically creates new capacity requests every eight hours until it obtains capacity.

Verify job allocations in your cluster

To verify whether your nodes have jobs scheduled on them, check the node state suffix in Slurm. To do so, complete the following steps:

  1. If you haven't already, then connect to your cluster's login node.
  2. View information about nodes and partitions in your cluster: none sinfo In the output, you can view the states for each node:

    • alloc: all vCPUs on the node have been assigned to one or more jobs.
    • idle: the node has obtained capacity and is preparing to run the job.
    • #idle: Cluster Director is provisioning capacity to run your job on the node. If the node is a Flex-start VM, then this state also indicates that Cluster Director is attempting to gain capacity.
    • idle~: The node is not running.
    • %idle: Cluster Director is deleting the node.
    • ~idle: Cluster Director is stopping the node.
    • mix: Slurm has allocated jobs only on some nodes.
    • #mix: Slurm has allocated jobs only on some nodes; however, Cluster Director is still looking for capacity to run the jobs.

Cancel jobs

To cancel the upcoming jobs that you've scheduled on a node, forcefully stop and shut down the node:

scontrol update nodename=NODE_NAMES state=power_down_force

Replace NODE_NAMES with a comma-separated list of nodes that you want to stop and shut down such as node-1,node-2. When the stop operation starts, Slurm sets the node states to ~idle. Then, when the operation completes, the node states change to idle~.

What's next