Cluster Toolkit is open-source and managed through a GitHub repository, which you can clone to your local environment. Regularly check the Cluster Toolkit release notes for new versions and updates.
You can download a newer version of the Cluster Toolkit software to get access to new features and bug fixes.
Overview
To update Cluster Toolkit, you must upgrade the gcluster
command-line tool. If software or hardware configurations need to be changed,
then you also need to re-deploy your cluster.
Before you update the software, consider the following important points:
- Immutable fields: Many aspects of your cluster configuration are immutable after creation. You can't change these fields without re-deploying the cluster.
- Backup: Before making any significant changes or re-deployments, ensure you have proper backups of your data and configurations.
Update the gcluster command-line tool
To update Cluster Toolkit, pull the latest changes
from the GitHub GoogleCloudPlatform/cluster-toolkit repository
and rebuild the gcluster command-line tool.
Go to the toolkit directory where you originally cloned the repository:
cd cluster-toolkitPull the updates from the upstream repository:
git pullRebuild the
gclustercommand-line tool by running the following command:makeThis command compiles and replaces the previous executable with the updated version. This step is necessary to incorporate new features and bug fixes.
Redeploy the cluster
For basic changes to a running cluster, like adding or removing a partition or resizing an existing one, you can edit and redeploy the cluster blueprint.
For instructions, see the documentation for your environment:
- For Compute Engine and Slurm: Reconfigure a running cluster
- For Google Kubernetes Engine: Reconfigure a GKE cluster
To modify the hardware infrastructure, change immutable properties of the cluster, or if Cluster Toolkit software has a major change, then follow these steps:
Delete the existing cluster.
Remove all compute nodes in the cluster. The process for removing compute nodes depends on your environment:
Compute Engine and Slurm
For Compute Engine and Slurm environments:
See Manage static compute nodes. You can use the following command to gracefully drain and power down nodes:
scontrol update NodeName=NODES_TO_UPDATE State=POWER_DOWN_ASAP- Alternatively, you can configure the
cleanup_compute_nodessetting on the Slurm controller to automatically destroy static compute nodes when the cluster is destroyed.
Google Kubernetes Engine
For Google Kubernetes Engine environments:
- See Reconfigure a GKE cluster for instructions on managing nodes within a GKE cluster.
Use the
destroycommand to tear down the old deployment:Warning: The destroy command is irreversible. Make sure you can recreate the cluster before running this command.
./gcluster destroy DEPLOYMENT_FOLDER_NAME --auto-approve
Update the cluster blueprint with any needed changes.
Create a new cluster deployment folder based on the updated blueprint, using the
-wflag to overwrite the previous deployment../gcluster create BLUEPRINT_NAME -wDeploy the new cluster.
./gcluster deploy DEPLOYMENT_FOLDER_NAME
What's next
- Learn about Cluster blueprints.
- Review best practices for running HPC workloads.
- Try a quickstart tutorial, see Deploy an HPC cluster with Slurm.