Provision Managed Lustre on GKE using XPK

Standard

This document explains how to integrate Managed Lustre with GKE to create an optimized environment for demanding, data-intensive workloads like artificial intelligence (AI), machine learning (ML), and high performance computing (HPC).

In this document you provision a GKE cluster with XPK, create a Managed Lustre instance, and attach it to the cluster. To test this configuration, you run a workload on nodes that flex-start provisions.

This document is intended for Machine learning (ML) engineers and Data and AI specialists who are interested in exploring Kubernetes container orchestration capabilities backed by Managed Lustre instances. To learn more about common roles and example tasks referenced in Google Cloud content, see Common GKE user roles and tasks.

Background

This section describes the key technologies used in this document:

XPK

XPK is a tool that simplifies the provisioning and management of GKE clusters and workloads, especially for AI/ML tasks. XPK helps generate preconfigured, training-optimized infrastructure, which makes it a good option for proofs-of-concept and testing environments.

You can create a cluster that uses TPUs by using the Google Cloud CLI or an Accelerated Processing Kit (XPK).

Use the gcloud CLI to manually create your GKE cluster instance for precise customization or expansion of existing production GKE environments.
Use XPK to quickly create GKE clusters and run workloads for proof-of-concept and testing. For more information, see the XPK README.

This document uses XPK exclusively for provisioning and managing resources.

For more information, see the Accelerated Processing Kit (XPK) documentation.

Flex-start

Flex-start lets you optimize TPU provisioning by paying only for the resources you need. Flex-start is recommended if your workload requires dynamically provisioned resources as needed, for up to seven days and cost-effective access.

This document uses flex-start as an example of consumption option, but you can also use other options, for example reservations or Spot. For more information, see About accelerator consumption options for AI/ML workloads in GKE.

Managed Lustre

Managed Lustre is a high-performance, parallel file system service designed for demanding workloads. The Managed Lustre CSI driver lets you integrate Managed Lustre instances with GKE, using standard Kubernetes Persistent Volume Claims (PVCs) and Persistent Volumes (PVs). This driver is particularly beneficial for AI, ML, and HPC workloads requiring persistent, scalable, and high-throughput storage.

For more information, see About the Managed Lustre CSI driver.

Before you begin

Before you start, make sure that you have performed the following tasks:

Enable the Google Cloud Managed Lustre API and the Google Kubernetes Engine API.

Enable APIs

If you want to use the Google Cloud CLI for this task, install and then initialize the gcloud CLI. If you previously installed the gcloud CLI, get the latest version by running the gcloud components update command. Earlier gcloud CLI versions might not support running the commands in this document.
Note: For existing gcloud CLI installations, make sure to set the compute/region property. If you use primarily zonal clusters, set the compute/zone instead. By setting a default location, you can avoid errors in the gcloud CLI like the following: One of [--zone, --region] must be supplied: Please specify location. You might need to specify the location in certain commands if the location of your cluster differs from the default that you set.

Prepare your environment

This section shows you how prepare your cluster environment.

In the new terminal window, create a virtual environment:

VENV_DIR=~/venvp4;python3 -m venv $VENV_DIR;source $VENV_DIR/bin/activate

Install XPK by following the steps in the XPK installation file. Use pip install instead of cloning from source.
Set the default environment variables:
```
gcloud config set project PROJECT_ID
gcloud config set billing/quota_project PROJECT_ID
export PROJECT_ID=$(gcloud config get project)
export LOCATION=LOCATION
export CLUSTER_NAME=CLUSTER_NAME
export GKE_VERSION=VERSION
export NETWORK_NAME=NETWORK_NAME
export IP_RANGE_NAME=IP_RANGE_NAME
export FIREWALL_RULE_NAME=FIREWALL_RULE_NAME
export ACCELERATOR_TYPE=v6e-16
export NUM_SLICES=1
```
Replace the following values:
- PROJECT_ID: your Google Cloud project ID.
- LOCATION: the zone of your GKE cluster. Select a zone for both flex-start and Managed Lustre instances. For example, us-west4-a. For valid throughput values, see About GPU and TPU provisioning with flex-start provisioning mode.
- CLUSTER_NAME: the name of your GKE cluster.
- VERSION: the GKE version. Ensure this is at least the minimum version that supports Managed Lustre. For example, 1.33.2-gke.1111000.
- NETWORK_NAME: the name of the network that you create.
- IP_RANGE_NAME: the name of the IP address range.
- FIREWALL_RULE_NAME: the name of the firewall rule.
The preceding commands configure a v6e-16 accelerator type. This configuration includes the following variables:
- ACCELERATOR_TYPE=v6e-16: corresponds to TPU Trillium with a 4x4 topology. This TPU version instructs GKE to provision a multi-host slice node pool. The v6e-16 maps to the ct6e-standard-4t machine type in GKE.
- NUM_SLICES=1: the number of TPU slice node pools that XPK creates for the ACCELERATOR_TYPE that you select.
If you want to customize the ACCELERATOR_TYPE and NUM_SLICES variables, refer to the following documents to find the available combinations:
- To identify the TPU version, machine type for GKE, topology, and available zone that you want to use, see Plan TPUs in GKE.
- To map the GKE machine type with the accelerator type in the Cloud TPU API, see the TPU Trillium (v6e) documentation.

Prepare a VPC network

Prepare a Virtual Private Cloud network for your Managed Lustre instance and GKE cluster.

Enable the Service Networking API:

gcloud services enable servicenetworking.googleapis.com \
        --project=${PROJECT_ID}

Create a VPC network:

gcloud compute networks create ${NETWORK_NAME} \
        --subnet-mode=auto --project=${PROJECT_ID} \
        --mtu=8896

Create an IP address range for VPC peering:

gcloud compute addresses create ${IP_RANGE_NAME} \
        --global \
        --purpose=VPC_PEERING \
        --prefix-length=20 \
        --description="Managed Lustre VPC Peering" \
        --network=${NETWORK_NAME} \
        --project=${PROJECT_ID}

Get the CIDR range of the IP address range:

CIDR_RANGE=$(
    gcloud compute addresses describe ${IP_RANGE_NAME} \
            --global  \
            --format="value[separator=/](address, prefixLength)" \
            --project=${PROJECT_ID}
)

Create a firewall rule to allow TCP traffic from the IP address range:

gcloud compute firewall-rules create ${FIREWALL_RULE_NAME} \
        --allow=tcp:988,tcp:6988 \
        --network=${NETWORK_NAME} \
        --source-ranges=${CIDR_RANGE} \
        --project=${PROJECT_ID}

Connect the VPC peering.

gcloud services vpc-peerings connect \
        --network=${NETWORK_NAME} \
        --project=${PROJECT_ID} \
        --ranges=${IP_RANGE_NAME} \
        --service=servicenetworking.googleapis.com

Create a Managed Lustre storage instance

Create a Managed Lustre storage instance.

Set storage instance variables:
```
export STORAGE_NAME=STORAGE_NAME
export STORAGE_THROUGHPUT=STORAGE_THROUGHPUT
export STORAGE_CAPACITY=STORAGE_CAPACITY_GIB
export STORAGE_FS=lfs
```
Replace the following values:
- STORAGE_NAME: the name of your Managed Lustre instance.
- STORAGE_THROUGHPUT: the throughput of the Managed Lustre instance, in MiB/s per TiB. For valid throughput values, see Calculate your new capacity.
- STORAGE_CAPACITY_GIB: the capacity of the Managed Lustre instance, in GiB. For valid capacity values, see Allowed capacity and throughput values.

Create the Managed Lustre instance:

gcloud lustre instances create ${STORAGE_NAME} \
        --per-unit-storage-throughput=${STORAGE_THROUGHPUT} \
        --capacity-gib=${STORAGE_CAPACITY} \
        --filesystem=${STORAGE_FS} \
        --location=${LOCATION} \
        --network=projects/${PROJECT_ID}/global/networks/${NETWORK_NAME} \
        --project=${PROJECT_ID} \
        --async # Creates the instance asynchronously

The --async flag creates the instance asynchronously and provides an operation ID to track its status.

Check the operation's status:

gcloud lustre operations describe OPERATION_ID  \
        --location=${LOCATION} \
        --project=${PROJECT_ID}

Replace OPERATION_ID with the ID from the output of the previous asynchronous command. If you don't have the ID, you can list all operations:

gcloud lustre operations list \
        --location=${LOCATION} \
        --project=${PROJECT_ID}

The instance is ready when the command output shows done: true.

Use XPK to create a GKE cluster

Use XPK to create a GKE cluster with a node pool.

Create a GKE cluster:

xpk cluster create --cluster ${CLUSTER_NAME} \
        --num-slices=${NUM_SLICES} \
        --tpu-type=${ACCELERATOR_TYPE} \
        --zone=${LOCATION} \
        --project=${PROJECT_ID} \
        --gke-version=${GKE_VERSION} \
        --custom-cluster-arguments="--network=${NETWORK_NAME}" \
        --enable-lustre-csi-driver \
        --flex

This command creates a GKE cluster by using XPK. The cluster is configured to use flex-start for node provisioning and has the Managed Lustre CSI driver enabled.

Attach the storage instance to the cluster

To configure the PersistentVolume (PV) and PersistentVolumeClaim (PVC), this section uses the XPK storage attach command (xpk storage attach) with a manifest file. This section uses an example manifest from the XPK source code.

Attach the Managed Lustre storage instance to your GKE cluster by completing these steps:

Download the example manifest file to your current working directory and save it as lustre-manifest-attach.yaml.
Update the manifest file with your Managed Lustre instance's information:
1. In the PersistentVolume section, replace the following values:
  - STORAGE_SIZE: the size of the Managed Lustre instance, in GiB.
  - PROJECT_ID/ZONE/INSTANCE_NAME: the full resource path of your Managed Lustre instance.
  - IP_ADDRESS: the IP address of the Managed Lustre instance.
  - FILE_SYSTEM: the file system type, which is lfs.
2. In the PersistentVolumeClaim section, replace the following values:
  - STORAGE_SIZE: The size of the PersistentVolumeClaim, in GiB.

Attach the storage instance to the cluster:

xpk storage attach ${STORAGE_NAME} \
        --cluster=${CLUSTER_NAME} --project=${PROJECT_ID} --zone=${LOCATION} \
        --type=lustre \
        --mount-point='/lustre-data' \
        --readonly=false \
        --auto-mount=true \
        --manifest='./lustre-manifest-attach.yaml'

Verify that you attached the storage for the cluster:

xpk storage list \
        --cluster=${CLUSTER_NAME} --project=${PROJECT_ID} --zone=${LOCATION}

Run a workload

Run a workload with the attached Managed Lustre instance . The following example command lists available disks and creates a "hello" file in the Managed Lustre instance's directory.

Create and run the workload:

xpk workload create --workload test-lustre \
--cluster=${CLUSTER_NAME} --project=${PROJECT_ID} --zone=${LOCATION} \
--command="df -h && echo 'hello' > /lustre-data/hello.txt && cat /lustre-data/hello.txt" \
--tpu-type=${ACCELERATOR_TYPE} \
--num-slices=1 \
--flex

Clean up

After completing the steps in this document, to prevent unwanted charges incurring on your account, delete the cluster:

  xpk cluster delete --cluster ${CLUSTER_NAME} \
        --zone ${LOCATION} \
        --project ${PROJECT_ID}

What's next

Learn more about the Managed Lustre CSI driver.
Explore the Google Cloud Managed Lustre CSI driver reference.
Learn how to create and use a volume backed by Lustre.
Learn how to access existing Lustre instances.