- Your jobs have the resources that they need to run.
- You prevent unnecessary costs by scheduling jobs on all VMs in your cluster. This document explains how to configure jobs in Slurm for A4X VMs or Flex-start VMs. For a high-level overview of how Slurm orchestrates jobs in Cluster Director, see Slurm orchestration in Cluster Director.
Configure VMs for Slurm
The following sections describe the required Slurm configuration for A4 VMs and Flex-start VMs.
Configure A4X VMs for Slurm
For cluster partitions that only use A4X VMs, Slurm uses block topology to align job scheduling with the physical hardware structure of A4X machines; specifically, NVLink domains. This configuration minimizes network latency by ensuring that tasks run on VMs that are physically close to each other. When you submit a job in Slurm, you can control how the job interacts with blocks of A4X VMs by using the following flags:
--segment=SEGMENT_SIZE: this flag specifies to group nodes into segments of a specific size. This configuration lets Slurm fit your job into the available capacity by bypassing nodes that are drained or unavailable. The value must be between1and18. If your job can't start because there aren't enough adjacent nodes to match your segment size, then we recommend using a value of1.--exclusive=topo: this flag specifies to reserve an entire sub-block for a job. This isolation helps ensure that no other jobs share the NVLink domain, preventing interference.
Configure Flex-start VMs for Slurm
For cluster partitions that use Flex-start VMs, Slurm interacts with nodes as follows:
- Automatic reprovisioning of static nodes: after the maximum run duration for your Flex-start VMs ends, Cluster Director automatically requests to create new Flex-start VMs for the static nodes in your partition. This process helps you automatically obtain resources for your jobs.
- Capacity queuing: when you submit jobs or when Cluster Director
request resources for static nodes in a partition, Cluster Director
attempts to create the requested VMs. If capacity is unavailable, then
Cluster Director maintains the request for up to eight hours. After
that time, if resources are still unavailable, Slurm sets the node state
to
DOWN. For static nodes, Cluster Director automatically creates new capacity requests every eight hours until it obtains capacity.
Verify job allocations in your cluster
To verify whether your nodes have jobs scheduled on them, check the node state suffix in Slurm. To do so, complete the following steps:
- If you haven't already, then connect to your cluster's login node.
View information about nodes and partitions in your cluster:
none sinfoIn the output, you can view the states for each node:alloc: all vCPUs on the node have been assigned to one or more jobs.idle: the node has obtained capacity and is preparing to run the job.#idle: Cluster Director is provisioning capacity to run your job on the node. If the node is a Flex-start VM, then this state also indicates that Cluster Director is attempting to gain capacity.idle~: The node is not running.%idle: Cluster Director is deleting the node.~idle: Cluster Director is stopping the node.mix: Slurm has allocated jobs only on some nodes.#mix: Slurm has allocated jobs only on some nodes; however, Cluster Director is still looking for capacity to run the jobs.
Cancel jobs
To cancel the upcoming jobs that you've scheduled on a node, forcefully stop and shut down the node:
scontrol update nodename=NODE_NAMES state=power_down_force
Replace NODE_NAMES with a comma-separated list of nodes
that you want to stop and shut down such as node-1,node-2. When the stop
operation starts, Slurm sets the node states to ~idle. Then, when the
operation completes, the node states change to idle~.