Create VM instances

The vm-instance module helps you provision and configure Compute Engine instances within your Cluster Toolkit blueprints. With this module, you streamline the deployment of high performance computing (HPC), artificial intelligence (AI), and machine learning (ML) workloads. You can consistently define settings such as machine type, image, networking, and placement policies. This approach helps ensure efficient and standardized resource provisioning.

For the complete list of inputs and outputs that you can use with this module, see the vm-instance module page in the Cluster Toolkit GitHub repository.

Before you begin

Before you begin, verify that you meet the following requirements:

You have installed and configured Cluster Toolkit. For installation instructions, see Set up Cluster Toolkit.
You have an existing cluster blueprint. You can use and modify an existing blueprint or create one from scratch. For a working example of a blueprint configured for the vm-instance module, see the examples/hpc-slurm.yaml file. For more information about creating and customizing blueprints, see Cluster blueprint.
To view a complete list of blueprints that support the vm-instance module, go to the Cluster blueprint catalog page, click the Select machine type menu and then select a machine family, such as N2.

Required roles

To get the permissions that you need to deploy the VM instances, ask your administrator to grant you the following IAM roles on your project:

Compute Instance Admin (v1) (roles/compute.instanceAdmin.v1)
Service Account User (roles/iam.serviceAccountUser)

For more information about granting roles, see Manage access to projects, folders, and organizations.

You might also be able to get the required permissions through custom roles or other predefined roles.

Create a basic VM instance group

To create a group of virtual machine (VM) instances, add the vm-instance module to the deployment_groups section of your blueprint. You must specify the number of instances and the machine type.

The following example creates a cluster of 8 compute VMs that uses the c2-standard-60 machine type and connects the VMs to a VPC network that a separate network module defines.

- id: compute
  source: modules/compute/vm-instance
  use: [network]
  settings:
    instance_count: 8
    name_prefix: compute
    machine_type: c2-standard-60

Configure network connectivity

The vm-instance module requires you to configure network connectivity for your virtual machines. You can configure network connectivity in one of the following ways:

Reference a network module Connect your VMs by using a vpc module or a pre-existing-vpc module. Specify the module ID in the use field. The vm-instance module automatically configures the network interface using the outputs from the referenced network. We recommend this method as this configuration is less manual.
Define network interfaces manually. If you need to connect to multiple networks or require advanced customization, then use the network_interfaces setting. This setting lets you manually configure specific network interfaces on the VM instance.

The format for this setting matches the network_interface block in the Terraform google_compute_instance resource. For more information, see the Terraform documentation.

Note: If you provide the network_interfaces variable, then the module ignores any networks associated through the use field. Additionally, the bandwidth_tier and disable_public_ips settings don't apply to networks that you define in network_interfaces input. You can find more information about this input on GitHub.

Configure VM placement

Use the placement_policy setting to control where your VM instances are physically located relative to each other within a zone. This configuration is critical for high performance computing (HPC) workloads that require low latency (compact placement) or high availability (spread placement). For more information about placement policies, see the Placement policies overview.

Compact placement

Compact placement creates VMs close to each other to minimize network latency. By default, the policy results in the most compact set of VMs available.

- id: compute
  source: modules/compute/vm-instance
  use: [network]
    settings:
      ...
      placement_policy:
        collocation: "COLLOCATED"

To enforce strict compactness and fail the deployment if the specified level of compactness is not available, use the max_distance setting:

placement_policy:
  collocation: "COLLOCATED"
  max_distance: 1

Spread placement

Spread placement ensures that VMs are placed in different availability domains to improve fault tolerance.

- id: compute
  source: modules/compute/vm-instance
  use: [network]
    settings:
      ...
      placement_policy:
        availability_domain_count: 2

Configure simultaneous multithreading (SMT)

Simultaneous multithreading (SMT) is disabled by default (threads_per_core=1) in this module. This configuration results in only physical cores being visible on the VM. This default is often preferred for HPC workloads to improve performance.

When the threads_per_core field is set to a value of 2, the total number of virtual cores for the c2-standard-60 machine type is extended to 60 virtual cores. With the threads_per_core field set to 1 (SMT turned off), 30 physical cores are available on the VM.

To enable SMT and expose virtual cores, set the threads_per_core field to a value of 2:

settings:
  instance_count: 8
  machine_type: c2-standard-60
  threads_per_core: 2

Configure GPU support

To learn more about GPU support in vm-instance and other Cluster Toolkit modules, see GPU support in the GitHub repository.

Replace specific VM instances

The vm-instance module automatically replaces your VM instances when you change the instance_image variable and run terraform apply or gcluster deploy on the deployment group folder. However, creating a new image in an image family does not automatically trigger a replacement.

To selectively replace specific VM instances without changing the configuration, use the terraform apply -replace command:

terraform state list
# Search for the module ID and resource address
terraform apply -replace="address"

For more information on the syntax for this command, see the Terraform documentation.

What's next

To deploy this blueprint, see Deploy a cluster.
For a complete list of all available input fields and output values, see the vm-instance module in the GitHub repository.