Update Cluster Toolkit version

Cluster Toolkit is open-source and managed through a GitHub repository, which you can clone to your local environment. Regularly check the Cluster Toolkit release notes for new versions and updates.

You can download a newer version of the Cluster Toolkit software to get access to new features and bug fixes.

Overview

To update Cluster Toolkit, you must upgrade the gcluster command-line tool. If software or hardware configurations need to be changed, then you also need to re-deploy your cluster.

Before you update the software, consider the following important points:

  • Immutable fields: Many aspects of your cluster configuration are immutable after creation. You can't change these fields without re-deploying the cluster.
  • Backup: Before making any significant changes or re-deployments, ensure you have proper backups of your data and configurations.

Update the gcluster command-line tool

To update Cluster Toolkit, pull the latest changes from the GitHub GoogleCloudPlatform/cluster-toolkit repository and rebuild the gcluster command-line tool.

  1. Go to the toolkit directory where you originally cloned the repository:

    cd cluster-toolkit
    
  2. Pull the updates from the upstream repository:

    git pull
    
  3. Rebuild the gcluster command-line tool by running the following command:

    make
    

    This command compiles and replaces the previous executable with the updated version. This step is necessary to incorporate new features and bug fixes.

Redeploy the cluster

For basic changes to a running cluster, like adding or removing a partition or resizing an existing one, you can edit and redeploy the cluster blueprint.

For instructions, see the documentation for your environment:

To modify the hardware infrastructure, change immutable properties of the cluster, or if Cluster Toolkit software has a major change, then follow these steps:

  1. Delete the existing cluster.

    1. Remove all compute nodes in the cluster. The process for removing compute nodes depends on your environment:

      Compute Engine and Slurm

      For Compute Engine and Slurm environments:

      • See Manage static compute nodes. You can use the following command to gracefully drain and power down nodes:

        scontrol update NodeName=NODES_TO_UPDATE State=POWER_DOWN_ASAP
      • Alternatively, you can configure the cleanup_compute_nodes setting on the Slurm controller to automatically destroy static compute nodes when the cluster is destroyed.

      Google Kubernetes Engine

      For Google Kubernetes Engine environments:

    2. Use the destroy command to tear down the old deployment:

      Warning: The destroy command is irreversible. Make sure you can recreate the cluster before running this command.

      ./gcluster destroy DEPLOYMENT_FOLDER_NAME --auto-approve
      
  2. Update the cluster blueprint with any needed changes.

  3. Create a new cluster deployment folder based on the updated blueprint, using the -w flag to overwrite the previous deployment.

    ./gcluster create BLUEPRINT_NAME -w
    
  4. Deploy the new cluster.

    ./gcluster deploy DEPLOYMENT_FOLDER_NAME
    

What's next