Performance considerations

This page provides guidance on configuring your Google Cloud Managed Lustre environment to obtain the best performance.

Performance specifications

The following performance numbers are approximate maximum values.

IOPS

Maximum IOPS scale linearly per TiB of provisioned instance capacity.

Throughput Tier Read IOPS (per TiB) Write IOPS (per TiB)
125 MBps per TiB 725 700
250 MBps per TiB 1,450 1,400
500 MBps per TiB 2,900 2,800
1000 MBps per TiB 5,800 5,600

Metadata operations

Maximum metadata operations increase in steps per 72 GBps of provisioned throughput.

File stats File creates File deletes
Per 72 GBps 410,000 per second 115,000 per second 95,000 per second

Performance after increasing capacity

Increasing the storage capacity of an existing instance increases its maximum throughput and IOPS, and possibly its metadata performance.

Read throughput performance gradually improves as new data is written and redistributed across the additional storage. Write throughput performance increases immediately.

VPC network maximum transmission unit (MTU)

When creating your VPC network, setting the value of mtu (maximum transmission unit, or the size of the largest IP packet that can be transmitted on this network) to the maximum allowed value of 8896 improves performance up to 10% compared to the default value of 1460 bytes.

You can see the current MTU value of your network with the following command:

gcloud compute networks describe NETWORK_NAME --format="value(mtu)"

The MTU value of a network can be updated after the network has been created, but there are important considerations. See Change the MTU of a network for details.

Compute Engine machine types

Network throughput can be affected by your choice of machine type. In general, to obtain the best throughput:

  • Increase the number of vCPUs. Per-instance maximum egress bandwidth is generally 2 Gbps per vCPU, up to the machine type maximum.
  • Select a machine series that supports higher ingress and egress limits. For example, C2 instances with Tier_1 networking support up to 100 Gbps egress bandwidth. C3 instances with Tier_1 networking support up to 200 Gbps.
  • Enable per VM Tier_1 networking performance with larger machine types.
  • Use Google Virtual NIC (gVNIC). gVNIC is the only option for Generation 3 and newer machine types. gVNIC is required when using Tier_1 networking.

For detailed information, refer to Network bandwidth.

Multi-NIC configuration

By using Lustre's built-in multi-rail capability, clients can stripe network traffic across multiple network interface cards (multi-NIC). This aggregates bandwidth to saturate high-capacity Managed Lustre instances.

To configure multi-NIC, you must:

  • Select a machine type with multiple physical NICs.
  • Create a subnet for each NIC and assign each NIC to its subnet.
  • Follow the multi-NIC steps when connecting from Compute Engine or GKE.

Verify traffic balancing

Once you've configured multi-NIC, verify that data is balancing correctly.

Compute Engine

Verify data balancing directly on the VM by monitoring the configured network interfaces (for example, eth0 and eth1) using nload while generating traffic to the Managed Lustre backend:

nload -m eth0 eth1

In a successful multi-NIC configuration, the outgoing bitrates should be roughly equivalent across all configured interfaces.

GKE

Confirm that network traffic from your workload is balanced across multiple NICs by deploying a temporary network-debugger Pod to the node where your workload is scheduled:

  1. Identify the node where your workload is scheduled:

    kubectl get pod POD_NAME -o wide
    

    Replace POD_NAME with the name of your Pod. In the command output, note the name in the NODE column.

  2. Launch the network debugger on that node:

    kubectl run multi-nic-debug --rm -i --tty --image=nicolaka/netshoot \
      --overrides='{"spec": {"hostNetwork": true, "nodeSelector": {"kubernetes.io/hostname": "NODE_NAME"}, "tolerations": [{"key": "nvidia.com/gpu", "operator": "Exists", "effect": "NoSchedule"}]}}' \
      -- /bin/bash -c "apk update && apk add nload && nload -m eth0 eth1"
    

    Replace NODE_NAME with the node name from the previous step.

  3. In the output, analyze the Outgoing column bitrates for eth0 and eth1. If the configuration is successful, the bitrates are roughly equivalent. The output is similar to the following:

    Device eth0 [10.1.0.50] (1/2):
    ==========================================================================
    Incoming:                               Outgoing:
    Curr: 1.63 MBit/s                       Curr: 1.46 GBit/s
    Avg: 1.60 MBit/s                        Avg: 1.44 GBit/s
    Min: 1.40 MBit/s                        Min: 1.25 GBit/s
    Max: 1.64 MBit/s                        Max: 1.47 GBit/s
    Ttl: 590.94 GByte                       Ttl: 405.19 GByte
    
    Device eth1 [172.16.15.5] (2/2):
    ==========================================================================
    Incoming:                               Outgoing:
    Curr: 1.64 MBit/s                       Curr: 1.47 GBit/s
    Avg: 1.62 MBit/s                        Avg: 1.44 GBit/s
    Min: 1.42 MBit/s                        Min: 1.26 GBit/s
    Max: 1.66 MBit/s                        Max: 1.47 GBit/s
    Ttl: 587.68 GByte                       Ttl: 406.36 GByte
    
  4. Exit the debugger by pressing Ctrl+C.

Measuring single-client performance

To test read and write performance from a single Compute Engine client, use the fio (Flexible I/O tester) command line tool.

  1. Install fio:

    Rocky 8

    sudo dnf install fio -y
    

    Ubuntu 20.04 and 22.04

    sudo apt update
    sudo install fio
    
  2. Run the following command:

    fio --ioengine=libaio --filesize=32G --ramp_time=2s \
    --runtime=5m --numjobs=16 --direct=1 --verify=0 --randrepeat=0 \
    --group_reporting --directory=/lustre --buffer_compress_percentage=50 \
    --name=read --blocksize=1m --iodepth=64 --readwrite=read
    

The test takes approximately 5 minutes to complete. When finished, the results are displayed. Depending on your configuration, you can expect throughput up to your VM's maximum network speed, and thousands of IOPS per TiB.