Enable high-performance block storage

Google Distributed Cloud (GDC) air-gapped offers a high-performance storage tier designed for demanding workloads. This tier delivers performance scaling up to 30 IOPS per GB, an increase compared to the standard block storage tier, which offers up to 3 IOPS per GB. This document describes how to enable high-performance block storage and how to monitor its associated metrics, alerts, and billing details. It is intended for audiences within the platform administrator group (such as IT admins) or the application operator group (such as application developers).

The performance-rwo high-performance storage class is available for the Kubernetes cluster.

With the addition of the high-performance SKU, for volumes using performance-* storage classes, the volume snapshots and restore process is similar to what it is for standard-* storage classes. You can take a volume snapshot and restore a similar PVC without any change to the storage class or underlying Quality of Standard (QoS) values. The performance volumesnapshotclass captures the type of volume and using this volumesnapshot, you can restore the volume with the same storage class.

Before you begin

Before starting, ensure you have the following prerequisites:

GDC environment and version
- A running GDC instance that's been upgraded to version 1.15.1 or later.
Project
- A GDC project within an organization where you intend to provision high-performance volumes.
Access and permissions
- Sufficient Kubernetes permissions to create, manage, and use PersistentVolumeClaim (PVC) resources for container workloads, or VirtualMachineDisk resources for virtual machines within the target project namespace. Common roles needed include:
  - project-vm-admin to manage VMs and VM disks.
  - Roles allowing PVC management, which are often included in edit or custom roles.
  - No special organization-level roles are typically required for an end-user to consume the storage if the high-performance storage classes are already available in their project's cluster. The setup and exposure of these classes are an Infrastructure Operator (IO) or PA responsibility.
Understanding of storage classes
- Familiarity with the concept of Kubernetes StorageClass objects. High-performance tiers are exposed through specific storage classes.
- You must specify a high-performance storage class when creating a PVC or Virtual Machine Disk.
Capacity and quotas
- Ensure the organization and project have sufficient storage quota allocated for the high-performance tier.
- Be aware of any capacity limitations or performance guardrails on the specific GDC environment and hardware.

Apply `subcomponentOverride` to the required clusters

By default, the FeatureGate for high-performance SKU is set to State: TEST. To enable performance-* storage classes, the Platform Administrator (PA) or Application Operator (AO) must apply a subcomponentOverride to the required clusters with a higher FileHighPerformanceStorageClassStage value compared to the default value. The following example uses the higher value of production.

apiVersion: lcm.private.gdc.goog/v1
kind: SubcomponentOverride
metadata:
  name: file-common
  namespace: NAMESPACE
spec:
  features:
    operableParameters:
      FileHighPerformanceStorageClassStage: production
  subComponentRef: file-common

The namespace in the SubcomponentOverride specifies the cluster namespace (for example, cluster-1) where the flag is going to be set. Replace NAMESPACE with the corresponding namespaces of the clusters and create the SubcomponentOverride files. Apply this change using the following command:

kubectl --kubeconfig MANAGEMENT_API_SERVER_KUBECONFIG_PATH apply -f SUBOVERRIDE_USER_FILE

Replace the following:

MANAGEMENT_API_SERVER_KUBECONFIG_PATH: the path to the management API server cluster kubeconfig file.
SUBOVERRIDE_USER_FILE: the path to the SubcomponentOverride YAML file for the Kubernetes cluster.

To enable the high-performance SKU for Kubernetes clusters, apply the subcomponentOverride to the management API server cluster.

Monitor high-performance metrics and alerts

High-performance storage also includes additional metrics, such as QOSPolicy details. This feature allows metrics to be filtered and aggregated based on the QOSPolicy in ONTAP, making it possible to distinguish high-performance volumes from standard ones.

Observe metrics

The following high-performance metrics are available for the file observability stack:

metering_storage_allocated_capacity_bytes
metering_storage_used_capacity_bytes

Both of these metrics have been enriched with qos_policy information from ONTAP.

Observe dashboards

Building on the preceding metric enhancements, alerts are available for both standard and high-performance storage, enabling targeted monitoring and faster issue detection based on performance class. To visualize this data, high performance dashboards are available, which use the enriched metrics to provide insights into the performance and usage of high-performance volumes.

To see these dashboards, access Grafana and navigate to Dashboards>FILE and Dashboards>FILE/ONTAP.

The following high-performance dashboards are available:

Organization File Block Storage Performance Balance Dashboard.
The FILE/ONTAP dashboards are dashboards that scrape Harvest data. This lets you directly monitor ONTAP performance from Grafana.

Observe alerts

Alongside the metrics and dashboards, alerts related to high performance are also available. These alerts help detect issues in storage and guide users to the runbooks that have instructions that will help resolve the alerts. If an alert is fired, the alert will provide a runbook code that can be viewed in the service manual. PAs should follow these runbooks to resolve the alert. These alerts can be viewed in the Alert Rules section in Grafana by navigating to infra-obs > file-block-storage-perf-monitoring-rules.

The following high-performance alerts are available:

fb_high_org_volume_latency and fb_high_org_avg_volume_latency to track the average volume latency in an organization and individually, respectively.
fb_storage_node_too_busy to track the CPU usage of a node and alerts if the CPU usage is too high.
fb_current_ops_higher_than_expected to alert if the amount of ops on a current node is higher than expected.

Query high-performance billing

The SKU for high-performance block storage is C008-4FF2-45E7. Billing for this SKU is done with the following Prometheus query:


sum_over_time(metering_storage_allocated_capacity_bytes{sku_id='c008-4ff2-45e7'}[{{.RangeHalfOpen}}:1m]) / 2^30 / 60 / {{.Hours}}

Using metering_storage_allocated_capacity_bytes means that billing is on allocated bytes rather than actual used bytes.

The storage class and QoS policy for volumes are set when they are first provisioned. Since the QoS policy is a variable that can be changed in ONTAP directly, we set sku_id in metrics based on the qos_policy and then bill by sku_id.

Example workflows

You can use high-performance SKU in your workflows through different types of volume declarations. Ultimately, the underlying mechanism stays the same, for example, a performance-* backed PVC. The following examples showcase the different ways you can create a new tier of volumes.

Apply the following YAML configurations to the appropriate cluster by referencing the accurate performance-* storage class name.

High-performance PVCs and pod mounts:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: PVC_NAME
spec:
  storageClassName: performance-rwo
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 5Gi

---

apiVersion: v1
kind: Pod
metadata:
  name: POD_NAME
spec:
  volumes:
    - name: VOLUME_NAME
      persistentVolumeClaim:
        claimName: PVC_NAME
  containers:
    - name: CONTAINER_NAME
      image: gcr.io/GKE_RELEASE/asm/proxyv2:1.23.6-asm.14
      ports:
        - containerPort: 80
      volumeMounts:
        - name: VOLUME_NAME
          mountPath: "/data"

High-performance VM disks and VMs:

apiVersion: virtualmachine.gdc.goog/v1
kind: VirtualMachineDisk
metadata:
  name: BOOT_DISK_NAME
  namespace: NAMESPACE
spec:
  source:
    image:
      name: ubuntu-24.04-v20250809-gdch
      namespace: vm-system
  size: 25Gi
  type: Performance

---
apiVersion: virtualmachine.gdc.goog/v1
kind: VirtualMachineDisk
metadata:
  name: DATA_DISK_NAME
  namespace: NAMESPACE
spec:
  size: 2Gi
  type: Performance
---

apiVersion: virtualmachine.gdc.goog/v1
kind: VirtualMachine
metadata:
  name: VM_NAME
  namespace: NAMESPACE
spec:
  compute:
    virtualMachineType: n3-standard-2-gdc
  disks:
  - virtualMachineDiskRef:
      name: BOOT_DISK_NAME
    boot: true
    autoDelete: true
  - virtualMachineDiskRef:
      name: DATA_DISK_NAME
    autoDelete: true

High-performance volume snapshots and restores:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: SOURCE_PVC_NAME
spec:
  storageClassName: performance-rwo
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 5Gi
---

apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
  name: VOLUME_SNAPSHOT_NAME
  namespace: NAMESPACE
spec:
  volumeSnapshotClassName: performance
  source:
    persistentVolumeClaimName: SOURCE_PVC_NAME

---


apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: RESTORED_PVC_NAME
  namespace: NAMESPACE
spec:
  dataSource:
    name: VOLUME_SNAPSHOT_NAME
    kind: VolumeSnapshot
    apiGroup: snapshot.storage.k8s.io
  storageClassName: performance-rwo
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 5i