Google Distributed Cloud (GDC) air-gapped offers a high-performance storage tier designed for demanding workloads. This tier delivers performance scaling up to 30 IOPS per GB, an increase compared to the standard block storage tier, which offers up to 3 IOPS per GB. This document describes how to enable high-performance block storage and how to monitor its associated metrics, alerts, and billing details. It is intended for audiences within the platform administrator group (such as IT admins) or the application operator group (such as application developers).
The performance-rwo high-performance storage class is available for the Kubernetes cluster.
With the addition of the high-performance SKU, for volumes using performance-* storage classes, the volume snapshots and restore process is similar to what it is for standard-* storage classes. You
can take a volume snapshot and restore a similar PVC without any change to the storage class or underlying Quality of Standard (QoS) values. The performance volumesnapshotclass captures the type of
volume and using this volumesnapshot, you can restore the volume with the same storage class.
Before you begin
Before starting, ensure you have the following prerequisites:
GDC environment and version
- A running GDC instance that's been upgraded to version 1.15.1 or later.
Project
- A GDC project within an organization where you intend to provision high-performance volumes.
Access and permissions
- Sufficient Kubernetes permissions to create, manage, and use PersistentVolumeClaim (PVC) resources for container workloads, or
VirtualMachineDiskresources for virtual machines within the target project namespace. Common roles needed include:project-vm-adminto manage VMs and VM disks.- Roles allowing PVC management, which are often included in edit or custom roles.
- No special organization-level roles are typically required for an end-user to consume the storage if the high-performance storage classes are already available in their project's cluster. The setup and exposure of these classes are an Infrastructure Operator (IO) or PA responsibility.
- Sufficient Kubernetes permissions to create, manage, and use PersistentVolumeClaim (PVC) resources for container workloads, or
Understanding of storage classes
- Familiarity with the concept of Kubernetes
StorageClassobjects. High-performance tiers are exposed through specific storage classes. - You must specify a high-performance storage class when creating a PVC or Virtual Machine Disk.
- Familiarity with the concept of Kubernetes
Capacity and quotas
- Ensure the organization and project have sufficient storage quota allocated for the high-performance tier.
- Be aware of any capacity limitations or performance guardrails on the specific GDC environment and hardware.
Apply subcomponentOverride to the required clusters
By default, the FeatureGate for high-performance SKU is set to State: TEST. To enable performance-* storage classes, the Platform Administrator (PA) or Application Operator (AO) must apply a subcomponentOverride to the required clusters with a higher FileHighPerformanceStorageClassStage value compared to the default value. The following example uses the higher value of production.
apiVersion: lcm.private.gdc.goog/v1
kind: SubcomponentOverride
metadata:
name: file-common
namespace: NAMESPACE
spec:
features:
operableParameters:
FileHighPerformanceStorageClassStage: production
subComponentRef: file-common
The namespace in the SubcomponentOverride specifies the cluster namespace (for example, cluster-1) where the flag is going to be set. Replace NAMESPACE with the corresponding namespaces of the clusters and create the SubcomponentOverride files. Apply this change using the following command:
kubectl --kubeconfig MANAGEMENT_API_SERVER_KUBECONFIG_PATH apply -f SUBOVERRIDE_USER_FILE
Replace the following:
MANAGEMENT_API_SERVER_KUBECONFIG_PATH: the path to the management API server cluster kubeconfig file.SUBOVERRIDE_USER_FILE: the path to theSubcomponentOverrideYAML file for the Kubernetes cluster.
To enable the high-performance SKU for Kubernetes clusters, apply the subcomponentOverride to the management API server cluster.
Monitor high-performance metrics and alerts
High-performance storage also includes additional metrics, such as QOSPolicy details. This feature allows metrics to be filtered and aggregated based on the QOSPolicy in ONTAP, making it possible to distinguish high-performance volumes from standard ones.
Observe metrics
The following high-performance metrics are available for the file observability stack:
metering_storage_allocated_capacity_bytesmetering_storage_used_capacity_bytes
Both of these metrics have been enriched with qos_policy information from ONTAP.
Observe dashboards
Building on the preceding metric enhancements, alerts are available for both standard and high-performance storage, enabling targeted monitoring and faster issue detection based on performance class. To visualize this data, high performance dashboards are available, which use the enriched metrics to provide insights into the performance and usage of high-performance volumes.
To see these dashboards, access Grafana and navigate to Dashboards>FILE and Dashboards>FILE/ONTAP.
The following high-performance dashboards are available:
- Organization File Block Storage Performance Balance Dashboard.
- The FILE/ONTAP dashboards are dashboards that scrape Harvest data. This lets you directly monitor ONTAP performance from Grafana.
Observe alerts
Alongside the metrics and dashboards, alerts related to high performance are also available. These alerts help detect issues in storage and guide users to the runbooks that have instructions that will help resolve the alerts. If an alert is fired, the alert will provide a runbook code that can be viewed in the service manual. PAs should follow these runbooks to resolve the alert. These alerts can be viewed in the Alert Rules section in Grafana by navigating to infra-obs > file-block-storage-perf-monitoring-rules.
The following high-performance alerts are available:
fb_high_org_volume_latencyandfb_high_org_avg_volume_latencyto track the average volume latency in an organization and individually, respectively.fb_storage_node_too_busyto track the CPU usage of a node and alerts if the CPU usage is too high.fb_current_ops_higher_than_expectedto alert if the amount of ops on a current node is higher than expected.
Query high-performance billing
The SKU for high-performance block storage is C008-4FF2-45E7. Billing for this SKU is done with the following Prometheus query:
sum_over_time(metering_storage_allocated_capacity_bytes{sku_id='c008-4ff2-45e7'}[{{.RangeHalfOpen}}:1m]) / 2^30 / 60 / {{.Hours}}
Using metering_storage_allocated_capacity_bytes means that billing is on allocated bytes rather than actual used bytes.
The storage class and QoS policy for volumes are set when they are first provisioned. Since the QoS policy is a variable that can be changed in ONTAP directly, we set sku_id in metrics based on the qos_policy and then bill by sku_id.
Example workflows
You can use high-performance SKU in your workflows through different types of volume declarations. Ultimately, the underlying mechanism stays the same, for example, a performance-* backed PVC. The following examples showcase the different ways you can create a new tier of volumes.
Apply the following YAML configurations to the appropriate cluster by referencing the accurate performance-* storage class name.
High-performance PVCs and pod mounts:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: PVC_NAME
spec:
storageClassName: performance-rwo
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi
---
apiVersion: v1
kind: Pod
metadata:
name: POD_NAME
spec:
volumes:
- name: VOLUME_NAME
persistentVolumeClaim:
claimName: PVC_NAME
containers:
- name: CONTAINER_NAME
image: gcr.io/GKE_RELEASE/asm/proxyv2:1.23.6-asm.14
ports:
- containerPort: 80
volumeMounts:
- name: VOLUME_NAME
mountPath: "/data"
High-performance VM disks and VMs:
apiVersion: virtualmachine.gdc.goog/v1
kind: VirtualMachineDisk
metadata:
name: BOOT_DISK_NAME
namespace: NAMESPACE
spec:
source:
image:
name: ubuntu-24.04-v20250809-gdch
namespace: vm-system
size: 25Gi
type: Performance
---
apiVersion: virtualmachine.gdc.goog/v1
kind: VirtualMachineDisk
metadata:
name: DATA_DISK_NAME
namespace: NAMESPACE
spec:
size: 2Gi
type: Performance
---
apiVersion: virtualmachine.gdc.goog/v1
kind: VirtualMachine
metadata:
name: VM_NAME
namespace: NAMESPACE
spec:
compute:
virtualMachineType: n3-standard-2-gdc
disks:
- virtualMachineDiskRef:
name: BOOT_DISK_NAME
boot: true
autoDelete: true
- virtualMachineDiskRef:
name: DATA_DISK_NAME
autoDelete: true
High-performance volume snapshots and restores:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: SOURCE_PVC_NAME
spec:
storageClassName: performance-rwo
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi
---
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
name: VOLUME_SNAPSHOT_NAME
namespace: NAMESPACE
spec:
volumeSnapshotClassName: performance
source:
persistentVolumeClaimName: SOURCE_PVC_NAME
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: RESTORED_PVC_NAME
namespace: NAMESPACE
spec:
dataSource:
name: VOLUME_SNAPSHOT_NAME
kind: VolumeSnapshot
apiGroup: snapshot.storage.k8s.io
storageClassName: performance-rwo
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5i
Replace the following:
BOOT_DISK_NAME: the name of the virtual machine boot disk.CONTAINER_NAME: the name of the container.DATA_DISK_NAME: the name of the virtual machine data disk.NAMESPACE: the Kubernetes namespace for the resources.POD_NAME: the name of the pod.PVC_NAME: the name of thePersistentVolumeClaimresource.RESTORED_PVC_NAME: the name of the restoredPersistentVolumeClaimresource.SOURCE_PVC_NAME: the name of the sourcePersistentVolumeClaimresource for snapshot.VM_NAME: the name of the virtual machine.VOLUME_NAME: the name of the volume.VOLUME_SNAPSHOT_NAME: the name of theVolumeSnapshotresource.GKE_RELEASE: the GKE release version.