Use the managed-lustre module to create a
Managed Lustre instance.
Google Cloud Managed Lustre is a high-performance network file system that you can mount to one or more Compute Engine instances.
You can mount this storage to other modules, such as the slurm-partition
module, by using the use
keyword. This keyword handles the client installation and monitoring processes
for you.
For the complete list of inputs and outputs, see the managed-lustre
module
in the Cluster Toolkit GitHub repository.
Before you begin
Before you begin, verify that you meet the following requirements:
- You have installed and configured Cluster Toolkit. For installation instructions, see Set up Cluster Toolkit.
- You have an existing cluster blueprint. You can use and modify an existing
blueprint or create one from scratch. For a working example of a blueprint
configured for the
managed-lustremodule, see theexamples/gke-managed-lustre.yamlfile. For more information about creating and customizing blueprints, see Cluster blueprint. - To view a complete list of blueprints that support the
managed-lustremodule, go to the Cluster blueprint catalog page, click the Select storage type menu and then select Google Cloud Managed Lustre. - The
managed-lustremodule does not create a continuous long-running workload or a full cluster. It provisions a Managed Lustre instance to provide high-performance network file storage for your cluster.
Required roles
To get the permissions that you need to create and mount Google Cloud Managed Lustre instances, ask your administrator to grant you the following IAM roles on your project:
- Managed Lustre Admin (
roles/lustre.admin) - Compute Network Admin (
roles/compute.networkAdmin) - Compute Instance Admin (v1) (
roles/compute.instanceAdmin.v1) - Service Account User (
roles/iam.serviceAccountUser) -
If importing data from Cloud Storage:
Storage Object Viewer (
roles/storage.objectViewer)
For more information about granting roles, see Manage access to projects, folders, and organizations.
You might also be able to get the required permissions through custom roles or other predefined roles.
To ensure that the service account has the necessary
permissions to encrypt and decrypt data by using CMEK,
ask your administrator to grant the
Cloud Key Management Service CryptoKey Encrypter/Decrypter (roles/cloudkms.cryptoKeyEncrypterDecrypter) IAM role to the service account on the Cloud Key Management Service key.
For more information about granting roles, see Manage access to projects, folders, and organizations.
Your administrator might also be able to give the service account the required permissions through custom roles or other predefined roles.
Create a Managed Lustre instance
To create a Managed Lustre instance, add the managed-lustre
module to your blueprint.
When you add this module, you can choose one of the following options:
Create a new VPC
The following example creates a new Virtual Private Cloud (VPC) and configures
private service access.
Cluster Toolkit passes the VPC network and the private service access
configuration to the managed-lustre
module to help ensure
the correct build order and network configuration.
- id: network
source: modules/network/vpc
- id: private_service_access
source: modules/network/private-service-access
use: [network]
settings:
prefix_length: 24
- id: lustre
source: modules/file-system/managed-lustre
use: [network, private_service_access]
Use an existing VPC
If you are using an existing network with private service access already
configured, then you must manually provide the peering name by using the
private_vpc_connection_peering setting. You can find this information in the
Google Cloud console on the VPC network peering page. For more information, see
Create a peering
configuration.
- id: network
source: modules/network/pre-existing-vpc
settings:
network_name: NETWORK_NAME
subnetwork_name: SUBNETWORK_NAME
- id: lustre
source: modules/file-system/managed-lustre
use: [network]
settings:
private_vpc_connection_peering: servicenetworking.googleapis.com
Replace the following:
NETWORK_NAME: the name of your existing VPC network.SUBNETWORK_NAME: the name of the subnetwork within that VPC.
Configure compatibility
You can configure Managed Lustre to be compatible with Slurm or GKE.
Slurm integration
When you use Slurm, the configuration depends on your image type.
Official images: The official
schedmd-slurm-publicimages include pre-installed Managed Lustre client modules.- id: managed_lustre source: modules/file-system/managed-lustre use: [network, private_service_access] settings: name: lustre-instance local_mount: /lustre remote_mount: lustrefs size_gib: 18000 - id: slurm_controller source: community/modules/scheduler/schedmd-slurm-gcp-v6-controller use: [network, managed_lustre, slurm_login] settings: machine_type: n2-standard-4The
local_mountsetting specifies the mount point directory for the Managed Lustre instance. If this setting is unspecified, then this value defaults to/shared.Custom images: For custom images, you must install the modules during the image build. If you are using
slurm-gcpversion 6.10.0 or later, then add theinstall_managed_lustre: truesetting to the Ansible playbook variables in your image builder.- type: data destination: /var/tmp/slurm_vars.json content: | { "reboot": false, "install_cuda": false, "install_gcsfuse": true, "install_lustre": false, "install_managed_lustre": true, "install_nvidia_repo": true, "install_ompi": true, "allow_kernel_upgrades": false, "monitoring_agent": "cloud-ops" }
GKE compatibility
By default, the Managed Lustre instance is not compatible with
GKE. To enable compatibility, set gke_support_enabled: true.
This setting changes the listening port to 6988.
- id: managed-lustre
source: modules/file-system/managed-lustre
use: [network, private_service_access]
settings:
gke_support_enabled: true
local_mount: /lustre
The local_mount setting specifies the mount point directory for the
Managed Lustre instance. If this setting is unspecified, then
this value defaults to /shared.
Use Customer-Managed Encryption Keys (CMEK)
Google Cloud Managed Lustre supports encryption with Customer-Managed Encryption
Keys (CMEK). To use CMEK, add the kms_key setting to the managed-lustre
module.
- id: lustre
source: modules/file-system/managed-lustre
use: [network, private_service_access]
settings:
kms_key: projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY_NAME
Replace the following:
PROJECT_ID: the ID of the project where the instance is deployed.LOCATION: the region or zone of the instance.KEY_RING: The name of the key ring that contains the encryption key.KEY_NAME: The name of the specific encryption key that is used to encrypt and decrypt your data.
Import data from Cloud Storage
You can import data from a Cloud Storage bucket when you create a
Managed Lustre instance. To do this, use the
import_gcs_bucket_uri setting to specify the source bucket. The module imports
the data into the directory specified by local_mount (defaults to /shared if
unspecified).
Before you import data from Cloud Storage, consider the following:
- This operation is a one-way copy operation. The Managed Lustre instance does not sync with any subsequent changes made to the Cloud Storage bucket.
- The copy process runs in the background after Terraform creates the Managed Lustre instance. Data might not appear in the mounted directory immediately after deployment completes.
- Verify that you have set up the correct IAM permissions for importing data from Cloud Storage to Managed Lustre. If you don't have the correct permissions, then the copy process might fail silently and leave you with an empty Managed Lustre instance.
Example configuration
The following example imports data from a specified bucket to the /lustre mount point.
- id: managed_lustre
source: modules/file-system/managed-lustre
use: [network, private_service_access]
settings:
name: lustre-instance
local_mount: /lustre
remote_mount: lustrefs
size_gib: 18000
import_gcs_bucket_uri: gs://BUCKET_NAME
Replace BUCKET_NAME with the name of your Cloud Storage bucket.
Monitor the import status
If you request an import, then gcluster outputs a JSON response similar to the following:
{
"name": "projects/PROJECT_ID/locations/LOCATION/operations/OPERATION_ID",
"metadata": {
"@type": "type.googleapis.com/google.cloud.lustre.v1.ImportDataMetadata",
"createTime": "START_TIME",
"target": "projects/PROJECT_ID/locations/LOCATION/instances/INSTANCE_NAME",
"requestedCancellation": false,
"apiVersion": "v1"
},
"done": false
}
This output includes the following values:
PROJECT_ID: the ID of the project where the instance is deployed.LOCATION: the region or zone of the instance.OPERATION_ID: the unique identifier for the import operation.START_TIME: the timestamp when the operation began.INSTANCE_NAME: the name of your Managed Lustre instance.
To check the status of the transfer or identify any errors, use the following command:
gcloud lustre operations describe OPERATION_ID \
--location LOCATION \
--project PROJECT_ID
Replace the following:
OPERATION_ID: the operation ID from the JSON output.LOCATION: the location from the JSON output.PROJECT_ID: the project ID from the JSON output.
For more information, see Get operation.
What's next
- For a complete list of input fields and output values, see the
managed-lustremodule on GitHub.