Create a Google Kubernetes node pool

Use the gke-node-pool module to create a Google Kubernetes Engine node pool. This module lets you configure worker nodes for your cluster, including options for machine types, autoscaling, hardware accelerators (GPUs/TPUs), and storage.

This module requires an existing GKE cluster.

For the complete list of inputs and outputs, see the gke-node-pool module in the Cluster Toolkit GitHub repository.

Before you begin

Before you begin, verify that you meet the following requirements:

  • You have installed and configured Cluster Toolkit. For installation instructions, see Set up Cluster Toolkit.
  • You have an existing cluster blueprint. You can use and modify an existing blueprint or create one from scratch. For a working example of a blueprint configured for the gke-node-pool module, see the community/examples/hpc-gke.yaml file. For more information about creating and customizing blueprints, see Cluster blueprint.
  • To view a complete list of blueprints that support the gke-node-pool module, go to the Cluster blueprint catalog page, click the Select scheduler menu and then select GKE.
  • In your blueprint, you must have added a GKE cluster.

Required roles

To get the permissions that you need to create and manage GKE node pools, ask your administrator to grant you the following IAM roles on your project:

For more information about granting roles, see Manage access to projects, folders, and organizations.

You might also be able to get the required permissions through custom roles or other predefined roles.

Create a GKE node pool

To create a node pool, add the gke-node-pool module to your blueprint and connect it to your GKE cluster.

The following example creates a node pool:

  - id: compute_pool
    source: modules/compute/gke-node-pool
    use: [gke_cluster]

Configure node taints and tolerations

By default, the gke-node-pool module applies the taint user-workload=true:NoSchedule to prevent system pods from being scheduled on your worker nodes. Your jobs targeting this node pool must include the corresponding toleration.

You can override this behavior by using the --node-taints flag. For more information, see Configure workload separation in GKE.

Configure Local SSD storage

GKE offers the following two options for managing locally attached SSDs (Local SSDs):

  • Ephemeral storage: To let GKE manage the storage, set the local_ssd_count_ephemeral_storage field to a value of 1. We recommend this method because it automatically attaches the storage to pods that request an emptyDir volume. For example:

    - id: local-ssd-pool
      source: modules/compute/gke-node-pool
      use: [gke_cluster, node_pool_service_account]
      settings:
        name: local-ssd
        machine_type: n2d-standard-2
        local_ssd_count_ephemeral_storage: 1
    
  • Raw block storage: To attach nodes as raw block storage, use the field local_ssd_count_nvme_block. This option requires you to handle Redundant Array of Independent Disks (RAID) settings, partitioning, and formatting manually, but it is useful for use cases that emptyDir volumes don't support, such as ReadOnlyMany or ReadWriteMany persistent volumes. For example:

    cli_deployment_vars:
      region: "us-central1"
      zone: "us-central1-a"
      network_name: "default"
      reservation_affinity: "ANY_RESERVATION"
      local_ssd_count_nvme_block: 2
    

Configure GPU support

When you attach a GPU to a node, the module automatically adds the taint nvidia.com/gpu=present:NoSchedule. To place jobs on these nodes, you must provide the equivalent toleration. If you use the gke-job-template module, the module applies this toleration automatically.

Install drivers

We recommend that you install Nvidia GPU drivers by applying a DaemonSet to the cluster. For installation instructions, see Manually install NVIDIA GPU drivers.

If you need to compile a custom driver (for example, to install a specific version), you must disable the enable_secure_boot option to let unsigned kernel modules load.

GPU configuration examples

This section describes how you can add GPUs to a node pool. Use one of the following approaches, based upon your requirements:

  • Pre-defined GPU machine families: For A2, A3, and G2 machine types, the GPU configuration is inferred automatically.

    - id: simple-a2-pool
      source: modules/compute/gke-node-pool
      use: [gke_cluster]
      settings:
        machine_type: a2-highgpu-1g
    
  • Partitioned GPUs (A100): To partition an A100 GPU, specify the guest_accelerator block.

    - id: multi-instance-gpu-pool
      source: modules/compute/gke-node-pool
      use: [gke_cluster]
      settings:
        machine_type: a2-highgpu-1g
        guest_accelerator:
        - gpu_partition_size: 1g.5gb
    
  • Time-sharing GPUs: To enable time-sharing, configure the gpu_sharing_config setting.

    - id: time-sharing-gpu-pool
      source: modules/compute/gke-node-pool
      use: [gke_cluster]
      settings:
        machine_type: a2-highgpu-1g
        guest_accelerator:
        - gpu_partition_size: 1g.5gb
          gpu_sharing_config:
            gpu_sharing_strategy: TIME_SHARING
            max_shared_clients_per_gpu: 3
    
  • Attached GPUs (N1 family): To attach GPUs to N1 machines, specify the type and count.

    - id: t4-pool
      source: modules/compute/gke-node-pool
      use: [gke_cluster]
      settings:
        machine_type: n1-standard-16
        guest_accelerator:
        - type: nvidia-tesla-t4
          count: 2
    
  • Attached GPUs with time-sharing (N1 family): To use a GPU with sharing configuration attached to an N1 machine, specify the gpu_sharing_config block within guest_accelerator.

      - id: n1-t4-pool
        source: modules/compute/gke-node-pool
        use: [gke_cluster]
        settings:
          name: n1-t4-pool
          machine_type: n1-standard-1
          guest_accelerator:
          - type: nvidia-tesla-t4
            count: 2
            gpu_driver_installation_config:
              gpu_driver_version: "LATEST"
            gpu_sharing_config:
              max_shared_clients_per_gpu: 2
              gpu_sharing_strategy: "TIME_SHARING"
    
  • Multi-networking: To add multi-networking support to a node pool (for example, when using A3 Mega GPUs), configure the vpc and multivpc modules, and reference them in your gke-cluster and gke-node-pool definitions.

    - id: network
      source: modules/network/vpc
      settings:
        subnetwork_name: gke-subnet
        secondary_ranges:
          gke-subnet:
          - range_name: pods
            ip_cidr_range: 10.4.0.0/14
          - range_name: services
            ip_cidr_range: 10.0.32.0/20
    
    - id: multinetwork
      source: modules/network/multivpc
      settings:
        network_name_prefix: multivpc-net
        network_count: 8
        global_ip_address_range: 172.16.0.0/12
        subnetwork_cidr_suffix: 16
    
    - id: gke-cluster
      source: modules/scheduler/gke-cluster
      use: [network, multinetwork]
      settings:
        cluster_name: $(vars.deployment_name)
    
    - id: a3-megagpu_pool
      source: modules/compute/gke-node-pool
      use: [gke-cluster, multinetwork]
      settings:
        machine_type: a3-megagpu-8g
    ...
    

Configure capacity reservations

Use Compute Engine reservations to ensure that resources are available for your workloads when needed. For more information about managing reservations, see Choose a reservation type.

After you create a reservation, you can consume the reserved Compute Engine virtual machine (VM) instances in your cluster by using the reservation_affinity field. Clusters that are deployed by using Cluster Toolkit support the following consumption modes:

  • NO_RESERVATION (default): The node pool does not consume any reservation and uses on-demand capacity. This mode is the default setting.
  • ANY_RESERVATION: The node pool automatically consumes any available matching reservation that creates VMs with properties (such as machine type and region) matching the node pool. If no matching reservation is found, the node pool generally falls back to on-demand capacity depending on your project's configuration.

    reservation_affinity:
      consume_reservation_type: ANY_RESERVATION
    
  • SPECIFIC_RESERVATION: The node pool targets a specific reservation by name. This mode sets the node pool to consume capacity only from the specified reservation.

    # Target a specific reservation
    reservation_affinity:
      consume_reservation_type: SPECIFIC_RESERVATION
      specific_reservations:
      - name: specific-reservation-1
    

To use a specific reservation, ensure that your configuration meets the following requirements:

  • A reservation with the specified name exists in the project (var.project_id) and zone (var.zones) that you specified.
  • The reservation consumption type is specific. For more information, see How reservations work.
  • The VM properties of the reservation, such as the machine type, accelerators, and Local SSD count, must match the properties of the node pool.

To use a shared reservation, you must explicitly specify the owner project of the shared reservation. A shared reservation lets a consumer project use capacity reserved by an owner project.

reservation_affinity:
  consume_reservation_type: SPECIFIC_RESERVATION
  specific_reservations:
  - name: specific-reservation-shared
    project: shared_reservation_owner_project_id

What's next