vm-instance module to provision one or more Compute Engine VM
instances as part of your
Cluster Toolkit deployment. This module lets you configure settings
such as the machine type, image, networking, and placement policies.
For the complete list of inputs and outputs that you can use with this module, see the vm-instance module page in the Cluster Toolkit GitHub repository.
Before you begin
Before you begin, verify that you meet the following requirements:
- You have installed and configured Cluster Toolkit. For installation instructions, see Set up Cluster Toolkit.
- You have an existing cluster blueprint. You can use and modify an existing
blueprint or create one from scratch. For a working example of a blueprint
configured for the
vm-instancemodule, see theexamples/hpc-slurm.yamlfile. For more information about creating and customizing blueprints, see Cluster blueprint. - To view a complete list of blueprints that support the
vm-instancemodule, go to the Cluster blueprint catalog page, click the Select machine type menu and then select a machine family, such as N2.
Required roles
To get the permissions that you need to deploy the VM instances, ask your administrator to grant you the following IAM roles on your project:
- Compute Instance Admin (v1) (
roles/compute.instanceAdmin.v1) - Service Account User (
roles/iam.serviceAccountUser)
For more information about granting roles, see Manage access to projects, folders, and organizations.
You might also be able to get the required permissions through custom roles or other predefined roles.
Create a basic VM instance group
To create a group of virtual machine (VM) instances, add the vm-instance
module to the deployment_groups section of your blueprint. You must specify
the number of instances and the machine type.
The following example creates a cluster of 8 compute VMs that uses the
c2-standard-60 machine type and connects the VMs to a VPC
network that a separate network module defines.
- id: compute
source: modules/compute/vm-instance
use: [network]
settings:
instance_count: 8
name_prefix: compute
machine_type: c2-standard-60
Configure network connectivity
The vm-instance module requires you to configure network connectivity for your
virtual machines. You can configure network connectivity in one of the following
ways:
Reference a network module Connect your VMs by using a
vpcmodule or apre-existing-vpcmodule. Specify the module ID in theusefield. Thevm-instancemodule automatically configures the network interface using the outputs from the referenced network. We recommend this method as this configuration is less manual.Define network interfaces manually. If you need to connect to multiple networks or require advanced customization, then use the
network_interfacessetting. This setting lets you manually configure specific network interfaces on the VM instance.The format for this setting matches the
network_interfaceblock in the Terraformgoogle_compute_instanceresource. For more information, see the Terraform documentation.
Configure VM placement
Use the placement_policy setting to control where your VM instances are
physically located relative to each other within a zone. This configuration is
critical for high performance computing (HPC)
workloads that require low latency (compact placement) or high
availability (spread placement). For more information about placement policies,
see the Placement policies
overview.
Compact placement
Compact placement creates VMs close to each other to minimize network latency. By default, the policy results in the most compact set of VMs available.
- id: compute
source: modules/compute/vm-instance
use: [network]
settings:
...
placement_policy:
collocation: "COLLOCATED"
To enforce strict compactness and fail the deployment if the specified level of
compactness is not available, use the max_distance setting:
placement_policy:
collocation: "COLLOCATED"
max_distance: 1
Spread placement
Spread placement ensures that VMs are placed in different availability domains to improve fault tolerance.
- id: compute
source: modules/compute/vm-instance
use: [network]
settings:
...
placement_policy:
availability_domain_count: 2
Configure simultaneous multithreading (SMT)
Simultaneous multithreading (SMT) is disabled by default (threads_per_core=1)
in this module. This configuration results in only physical cores being visible
on the VM. This default is often preferred for HPC workloads to improve
performance.
When the threads_per_core field is set to a value of 2, the total number of
virtual cores for the c2-standard-60 machine type is extended to 60 virtual
cores. With the threads_per_core field set to 1 (SMT turned off), 30
physical cores are available on the VM.
To enable SMT and expose virtual cores, set the threads_per_core field to a
value of 2:
settings:
instance_count: 8
machine_type: c2-standard-60
threads_per_core: 2
Configure GPU support
To learn more about GPU support in vm-instance and other Cluster Toolkit
modules, see GPU
support
in the GitHub repository.
Replace specific VM instances
The vm-instance module automatically replaces your VM instances when you
change the instance_image variable and run terraform apply or gcluster
deploy on the deployment group folder. However, creating a new image in an
image family does not automatically trigger a replacement.
To selectively replace specific VM instances without changing the configuration,
use the terraform apply -replace command:
terraform state list
# Search for the module ID and resource address
terraform apply -replace="address"
For more information on the syntax for this command, see the Terraform documentation.
What's next
- To deploy this blueprint, see Deploy a cluster.
- For a complete list of all available input fields and output values, see the vm-instance module in the GitHub repository.