Create a Google Kubernetes Engine persistent volume

Use the gke-persistent-volume module to create Kubernetes PersistentVolume (PV) and PersistentVolumeClaim (PVC) resources.

This module connects Cluster Toolkit storage modules to the gke-job-template.

This module supports the following storage backends:

For the complete list of inputs and outputs, see the gke-persistent-volume module in the Cluster Toolkit GitHub repository.

Before you begin

Before you begin, verify that you meet the following requirements:

  • You have installed and configured Cluster Toolkit. For installation instructions, see Set up Cluster Toolkit.
  • You have an existing cluster blueprint. You can use and modify an existing blueprint or create one from scratch. For a working example of a blueprint configured for the gke-persistent-volume module, see the examples/storage-gke.yaml file. For more information about creating and customizing blueprints, see Cluster blueprint.
  • To view a complete list of blueprints that support the gke-persistent-volume module, go to the Cluster blueprint catalog page, click the Select scheduler menu and then select GKE.
  • The gke-persistent-volume module does not create a continuous long-running workload or a full cluster. It defines Kubernetes PersistentVolume (PV) and PersistentVolumeClaim (PVC) resources that connect pre-provisioned storage volumes to your GKE cluster.
  • The gke-persistent-volume module calls the Kubernetes API to create resources. You must authorize the deployment machine to connect to the Kubernetes API. To authorize the deployment machine, add the master_authorized_networks setting to your gke-cluster module configuration. You must include the IP address of the deployment machine in this block. This configuration lets the deployment machine connect to the cluster.
  • Each gke-persistent-volume module defines a single file system. If you need multiple shared file systems, then you must include multiple instances of this module.

Required roles

To get the permissions that you need to create GKE persistent volumes and storage backends, ask your administrator to grant you the following IAM roles on your project:

For more information about granting roles, see Manage access to projects, folders, and organizations.

You might also be able to get the required permissions through custom roles or other predefined roles.

Use with Filestore

The following example creates a Filestore instance and uses the gke-persistent-volume module to expose the instance as shared storage (mounted at /data) to a GKE job.

  - id: gke_cluster
    source: modules/scheduler/gke-cluster
    use: [network1]
    settings:
      master_authorized_networks:
      - display_name: deployment-machine
        cidr_block: IP_ADDRESS/32

  - id: datafs
    source: modules/file-system/filestore
    use: [network1]
    settings:
      local_mount: /data

  - id: datafs-pv
    source: modules/file-system/gke-persistent-volume
    use: [datafs, gke_cluster]

  - id: job-template
    source: modules/compute/gke-job-template
    use: [datafs-pv, compute_pool, gke_cluster]

Replace IP_ADDRESS with the IP address of your deployment machine.

Use with Cloud Storage

The following example creates a Cloud Storage bucket and uses the gke-persistent-volume module to expose the bucket to the job template.

  - id: gke_cluster
    source: modules/scheduler/gke-cluster
    use: [network1]
    settings:
      master_authorized_networks:
      - display_name: deployment-machine
        cidr_block: IP_ADDRESS/32

  - id: data-bucket
    source: modules/file-system/cloud-storage-bucket
    settings:
      local_mount: /data

  - id: datagcs-pv
    source: modules/file-system/gke-persistent-volume
    use: [data-bucket, gke_cluster]

  - id: job-template
    source: modules/compute/gke-job-template
    use: [datagcs-pv, compute_pool, gke_cluster]

Use with Google Cloud Managed Lustre

The following example creates a Google Cloud Managed Lustre instance and uses the gke-persistent-volume module to expose the instance to the job template.

  - id: gke_cluster
    source: modules/scheduler/gke-cluster
    use: [network1]
    settings:
      master_authorized_networks:
      - display_name: deployment-machine
        cidr_block: IP_ADDRESS/32

  - id: data-managedlustre
    source: modules/file-system/managed-lustre
    settings:
      local_mount: /data

  - id: datalustre-pv
    source: modules/file-system/gke-persistent-volume
    use: [data-managedlustre, gke_cluster]

  - id: job-template
    source: modules/compute/gke-job-template
    use: [datalustre-pv, compute_pool, gke_cluster]

What's next