Create a Managed Lustre instance

Use the managed-lustre module to create a Managed Lustre instance.

Google Cloud Managed Lustre is a high-performance network file system that you can mount to one or more Compute Engine instances.

You can mount this storage to other modules, such as the slurm-partition module, by using the use keyword. This keyword handles the client installation and monitoring processes for you.

For the complete list of inputs and outputs, see the managed-lustre module in the Cluster Toolkit GitHub repository.

Before you begin

Before you begin, verify that you meet the following requirements:

  • You have installed and configured Cluster Toolkit. For installation instructions, see Set up Cluster Toolkit.
  • You have an existing cluster blueprint. You can use and modify an existing blueprint or create one from scratch. For a working example of a blueprint configured for the managed-lustre module, see the examples/gke-managed-lustre.yaml file. For more information about creating and customizing blueprints, see Cluster blueprint.
  • To view a complete list of blueprints that support the managed-lustre module, go to the Cluster blueprint catalog page, click the Select storage type menu and then select Google Cloud Managed Lustre.
  • The managed-lustre module does not create a continuous long-running workload or a full cluster. It provisions a Managed Lustre instance to provide high-performance network file storage for your cluster.

Required roles

To get the permissions that you need to create and mount Google Cloud Managed Lustre instances, ask your administrator to grant you the following IAM roles on your project:

For more information about granting roles, see Manage access to projects, folders, and organizations.

You might also be able to get the required permissions through custom roles or other predefined roles.

To ensure that the service account has the necessary permissions to encrypt and decrypt data by using CMEK, ask your administrator to grant the Cloud Key Management Service CryptoKey Encrypter/Decrypter (roles/cloudkms.cryptoKeyEncrypterDecrypter) IAM role to the service account on the Cloud Key Management Service key.

For more information about granting roles, see Manage access to projects, folders, and organizations.

Your administrator might also be able to give the service account the required permissions through custom roles or other predefined roles.

Create a Managed Lustre instance

To create a Managed Lustre instance, add the managed-lustre module to your blueprint.

When you add this module, you can choose one of the following options:

Create a new VPC

The following example creates a new Virtual Private Cloud (VPC) and configures private service access. Cluster Toolkit passes the VPC network and the private service access configuration to the managed-lustre module to help ensure the correct build order and network configuration.

  - id: network
    source: modules/network/vpc

  - id: private_service_access
    source: modules/network/private-service-access
    use: [network]
    settings:
      prefix_length: 24

  - id: lustre
    source: modules/file-system/managed-lustre
    use: [network, private_service_access]

Use an existing VPC

If you are using an existing network with private service access already configured, then you must manually provide the peering name by using the private_vpc_connection_peering setting. You can find this information in the Google Cloud console on the VPC network peering page. For more information, see Create a peering configuration.

  - id: network
    source: modules/network/pre-existing-vpc
    settings:
      network_name: NETWORK_NAME
      subnetwork_name: SUBNETWORK_NAME

  - id: lustre
    source: modules/file-system/managed-lustre
    use: [network]
    settings:
      private_vpc_connection_peering: servicenetworking.googleapis.com

Replace the following:

  • NETWORK_NAME: the name of your existing VPC network.
  • SUBNETWORK_NAME: the name of the subnetwork within that VPC.

Configure compatibility

You can configure Managed Lustre to be compatible with Slurm or GKE.

Slurm integration

When you use Slurm, the configuration depends on your image type.

  • Official images: The official schedmd-slurm-public images include pre-installed Managed Lustre client modules.

    - id: managed_lustre
      source: modules/file-system/managed-lustre
      use: [network, private_service_access]
      settings:
        name: lustre-instance
        local_mount: /lustre
        remote_mount: lustrefs
        size_gib: 18000
    
    - id: slurm_controller
      source: community/modules/scheduler/schedmd-slurm-gcp-v6-controller
      use: [network, managed_lustre, slurm_login]
      settings:
        machine_type: n2-standard-4
    

    The local_mount setting specifies the mount point directory for the Managed Lustre instance. If this setting is unspecified, then this value defaults to /shared.

  • Custom images: For custom images, you must install the modules during the image build. If you are using slurm-gcp version 6.10.0 or later, then add the install_managed_lustre: true setting to the Ansible playbook variables in your image builder.

    - type: data
      destination: /var/tmp/slurm_vars.json
      content: |
        {
          "reboot": false,
          "install_cuda": false,
          "install_gcsfuse": true,
          "install_lustre": false,
          "install_managed_lustre": true,
          "install_nvidia_repo": true,
          "install_ompi": true,
          "allow_kernel_upgrades": false,
          "monitoring_agent": "cloud-ops"
        }
    

GKE compatibility

By default, the Managed Lustre instance is not compatible with GKE. To enable compatibility, set gke_support_enabled: true. This setting changes the listening port to 6988.

  - id: managed-lustre
    source: modules/file-system/managed-lustre
    use: [network, private_service_access]
    settings:
      gke_support_enabled: true
      local_mount: /lustre

The local_mount setting specifies the mount point directory for the Managed Lustre instance. If this setting is unspecified, then this value defaults to /shared.

Use Customer-Managed Encryption Keys (CMEK)

Google Cloud Managed Lustre supports encryption with Customer-Managed Encryption Keys (CMEK). To use CMEK, add the kms_key setting to the managed-lustre module.

  - id: lustre
    source: modules/file-system/managed-lustre
    use: [network, private_service_access]
    settings:
      kms_key: projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY_NAME

Replace the following:

  • PROJECT_ID: the ID of the project where the instance is deployed.
  • LOCATION: the region or zone of the instance.
  • KEY_RING: The name of the key ring that contains the encryption key.
  • KEY_NAME: The name of the specific encryption key that is used to encrypt and decrypt your data.

Import data from Cloud Storage

You can import data from a Cloud Storage bucket when you create a Managed Lustre instance. To do this, use the import_gcs_bucket_uri setting to specify the source bucket. The module imports the data into the directory specified by local_mount (defaults to /shared if unspecified).

Before you import data from Cloud Storage, consider the following:

  • This operation is a one-way copy operation. The Managed Lustre instance does not sync with any subsequent changes made to the Cloud Storage bucket.
  • The copy process runs in the background after Terraform creates the Managed Lustre instance. Data might not appear in the mounted directory immediately after deployment completes.
  • Verify that you have set up the correct IAM permissions for importing data from Cloud Storage to Managed Lustre. If you don't have the correct permissions, then the copy process might fail silently and leave you with an empty Managed Lustre instance.

Example configuration

The following example imports data from a specified bucket to the /lustre mount point.

- id: managed_lustre
  source: modules/file-system/managed-lustre
  use: [network, private_service_access]
  settings:
    name: lustre-instance
    local_mount: /lustre
    remote_mount: lustrefs
    size_gib: 18000
    import_gcs_bucket_uri: gs://BUCKET_NAME

Replace BUCKET_NAME with the name of your Cloud Storage bucket.

Monitor the import status

If you request an import, then gcluster outputs a JSON response similar to the following:

{
  "name": "projects/PROJECT_ID/locations/LOCATION/operations/OPERATION_ID",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.lustre.v1.ImportDataMetadata",
    "createTime": "START_TIME",
    "target": "projects/PROJECT_ID/locations/LOCATION/instances/INSTANCE_NAME",
    "requestedCancellation": false,
    "apiVersion": "v1"
  },
  "done": false
}

This output includes the following values:

  • PROJECT_ID: the ID of the project where the instance is deployed.
  • LOCATION: the region or zone of the instance.
  • OPERATION_ID: the unique identifier for the import operation.
  • START_TIME: the timestamp when the operation began.
  • INSTANCE_NAME: the name of your Managed Lustre instance.

To check the status of the transfer or identify any errors, use the following command:

gcloud lustre operations describe OPERATION_ID \
    --location LOCATION \
    --project PROJECT_ID

Replace the following:

  • OPERATION_ID: the operation ID from the JSON output.
  • LOCATION: the location from the JSON output.
  • PROJECT_ID: the project ID from the JSON output.

For more information, see Get operation.

What's next