Cluster Toolkit is an open-source tool that simplifies the deployment of high performance computing (HPC) and AI/ML workloads on Google Cloud. It uses customizable blueprints to provision infrastructure that aligns with Google Cloud best practices.
Cluster Toolkit is highly customizable and extensible to address the deployment needs of a broad range of use cases.
Features
Cluster Toolkit lets you do the following:
- Efficiently create and deploy turnkey HPC, AI, and ML clusters that follow Google Cloud best practices.
- Configure and extend an open source solution.
- Integrate seamlessly with various partners, such as Intel DAOS, DDN EXAscaler, and Slurm.
- Monitor and gain performance visibility through integration with Cloud Monitoring.
Components
Cluster Toolkit consists of the following components:
- Cluster blueprint: A YAML file that defines your cluster's architecture by specifying which modules to use and how to configure them.
- Modules: Reusable, configurable building blocks that define specific resources like schedulers, storage, or compute nodes.
- The
gclustertool: A command-line utility that compiles your blueprint and modules into a deployment folder. - Deployment folder: A generated directory containing the Terraform and Packer configurations needed to deploy your cluster to Google Cloud. You can deploy this folder directly or customize it further before deployment.
How Cluster Toolkit works

Figure 1. Cluster Toolkit architecture overview
To deploy clusters on Google Cloud by using Cluster Toolkit,do the following:
- Set up your environment by using Cloud Shell or a local Linux or macOS terminal. If you use a local terminal, then you must install a few dependencies.
- Clone the Cluster Toolkit repository. This repository contains the
gclusterbinary, modules, cluster blueprint examples, and other resources. - Use an editor to create your cluster blueprint file. Example blueprints are available in the Cluster Toolkit repository.
- Run
gcluster createto generate a deployment folder that contains the necessary Terraform and Packer configurations. - Run the commands provided by the
gclustertool. After you run these commands, Terraform or Packer then deploys the cluster on Google Cloud. For more information, see Deploy a cluster. - After your cluster is deployed, you can submit jobs to your HPC cluster. You can also use Cloud Monitoring to analyze and monitor the Google Cloud resources that are used by your cluster.
Limitations
Cluster Toolkit supports updating specific configurations of an active cluster, such as resizing a Slurm partition or updating a GKE node pool. For more information, see the following guides:
For fundamental architectural changes, such as switching to a new VPC or changing the scheduler, you must redeploy the cluster:
- Delete the cluster.
- Update the cluster blueprint.
- Create the deployment folder.
- Deploy the cluster.
What's next
- To try a quickstart tutorial, see Deploy an HPC cluster with Slurm.
- To learn how to quickly deploy clusters using blueprints, see Cluster blueprints.
- To learn about how to change the functionality of blueprints using modules,see Modules.
- To view the project on GitHub, see the Cluster Toolkit repository.