When encountering issues mounting or connecting to a Managed Lustre file system on a client VM or instance, follow these steps to diagnose the problem.
Verify that the Managed Lustre instance is reachable
First, ensure that your Managed Lustre instance is reachable from your client instance:
sudo lctl ping IP_ADDRESS@tcp
To obtain the value of IP_ADDRESS, see Get an instance.
A successful ping returns a response similar to the following:
12345-0@lo
12345-10.115.0.3@tcp
A failed ping returns the following:
failed to ping 10.115.0.3@tcp: Input/output error
If your ping fails:
Make sure your Managed Lustre instance and your client instance are in the same VPC network. Compare the output of the following commands:
gcloud compute instances describe VM_NAME \ --zone=VM_ZONE \ --format='get(networkInterfaces[0].network)' gcloud lustre instances describe INSTANCE_NAME \ --location=ZONE --format='get(network)'The output looks like:
https://www.googleapis.com/compute/v1/projects/my-project/global/networks/my-network projects/my-project/global/networks/my-networkThe output of the
gcloud compute instances describecommand is prefixed withhttps://www.googleapis.com/compute/v1/; everything following that string must match the output of thegcloud lustre instances describecommand.Review your VPC network's firewall rules and routing configurations to ensure they allow traffic between your client instance and the Managed Lustre instance.
Check the LNet accept port (legacy instances)
Although the --gke-support-enabled flag is deprecated and no longer required
when creating new Managed Lustre instances, you might have
existing older instances that were created with this flag.
If you are connecting to a legacy instance where GKE support
was enabled, you must configure LNet on all client Compute Engine instances
to use accept_port 6988. See
Configure LNet for gke-support-enabled instances.
To determine whether an existing instance was configured with this legacy flag, run the following command:
gcloud lustre instances describe INSTANCE_NAME \
--location=LOCATION | grep gkeSupportEnabled
If the command returns gkeSupportEnabled: true, you must configure LNet on
your client VMs.
Ubuntu kernel version mismatch with Lustre client
For Compute Engine instances running Ubuntu, the Ubuntu kernel version must match the specific version of the Lustre client packages. If your Lustre client tools are failing, check whether your Compute Engine instance has auto-upgraded to a newer kernel.
To check your kernel version:
uname -r
The response looks like:
6.8.0-1029-gcp
To check your Lustre client package version:
dpkg -l | grep -i lustre
The response looks like:
ii lustre-client-modules-6.8.0-1029-gcp 2.14.0-ddn198-1 amd64 Lustre Linux kernel module (kernel 6.8.0-1029-gcp)
ii lustre-client-utils 2.14.0-ddn198-1 amd64 Userspace utilities for the Lustre filesystem (client)
If there is a mismatch between the kernel version listed from both commands, you must re-install the Lustre client packages.
Check dmesg for Lustre errors
Many Lustre warnings and errors are logged to the Linux kernel ring buffer. The
dmesg command prints the kernel ring buffer.
To search for Lustre-specific messages, use grep in conjunction with dmesg:
dmesg | grep -i lustre
Or, to look for more general errors that might be related:
dmesg | grep -i error
Mounting Lustre on a multi-NIC VM fails
When a VM has multiple network interface controllers (NICs), and the
Managed Lustre instance is on a VPC connected to a secondary NIC
(for example, eth1), mounting the instance may fail. To resolve this issue,
follow the instructions to mount using a secondary NIC.
Cannot connect from the 172.17.0.0/16 subnet range
Compute Engine and GKE clients with an IP address in the 172.17.0.0/16 subnet range cannot mount Managed Lustre instances.
Can't access Managed Lustre from a peered project
To access your Managed Lustre instance from a VM in a peered VPC network, you must use Network Connectivity Center (NCC). NCC lets you connect multiple VPC networks and on-premises networks to a central hub, providing connectivity between them.
For instructions on how to set up NCC, refer to the Network Connectivity Center documentation.
Mounting fails on Shielded VMs (Secure Boot)
Managed Lustre can't be mounted on
Shielded VMs. Attempting to load the
Lustre kernel module in a Secure Boot environment fails with the error:
ERROR: could not insert 'lustre': Required key not available.
Information to include with a support request
If you're unable to resolve the mount failure, gather diagnostic information before creating a support case.
Run sosreport: This utility collects system logs and configuration information and generates a compressed tarball:
sudo sosreport
Attach the sosreport archive and any relevant output from dmesg to your
support case.