Dataproc boot disks

You select standard, SSD, balanced Persistent Disk, or Google Cloud Hyperdisk balanced as boot disks for Dataproc cluster nodes.

Boot disk type options:

  • You can select a standard, SSD, or balanced Persistent Disk, boot disk for manager (master), primary worker, and secondary worker cluster nodes.

  • You can select hyperdisk balanced as the boot disk for manager (master) and primary worker nodes. Note that Dataproc automatically sets the secondary worker boot disk type to hyperdisk balanced when the primary worker boot disk type is set to hyperdisk balanced.

The default persistent boot disk type for Dataproc cluster manager (master) and primary worker nodes is standard (pd-standard). If the VM machine type supports only hyperdisk balanced as the boot disk, the default boot disk is hyperdisk balanced (hyperdisk-balanced). The default persistent boot disk type for cluster secondary worker nodes is the primary worker node persistent boot disk type.

Select persistent boot disk types for cluster nodes

You can select the persistent boot disk type when you create a cluster using the Google Cloud console, Google Cloud CLI, or Dataproc API.

Console

You can create a cluster and select cluster node boot disk types from the Configure nodes panel on the Dataproc Create a cluster page of the Google Cloud console.

gcloud CLI

You can create a cluster and select cluster node boot disk types using the gcloud dataproc clusters create command with the --master-boot-disk-type, --worker-boot-disk-type, and --secondary-worker-boot-disk-type flags.

Example:
gcloud dataproc clusters create CLUSTER_NAME \
    --region=REGION \
    --master-boot-disk-type=pd-ssd \
    --worker-boot-disk-type=hyperdisk-balanced \
    other args ...
p

REST API

You can set a value of pd-standard, pd-ssd, pd-balanced, or hyperdisk-balanced in the InstanceGroupConfig.DiskConfig.bootDiskType field in the masterConfig, workerConfig, and secondaryWorkerConfig as part of a cluster.create API request.

Hyperdisk settings

When you create a cluster with a hyperdisk balanced volume as the boot disk for a Dataproc cluster node, you can set the provisioned IOPS and throughput.

Console

You can set IOPS and throughput, or accept the default values from the Configure nodes panel on the Dataproc Create a cluster page.

gcloud CLI

You can set provisioned IOPS and provisioned throughput for cluster nodes with the hyperdisk-balanced boot disks using the gcloud dataproc clusters create command --master-boot-disk-provisioned-iops, --worker-boot-disk-provisioned-iops, --master-boot-disk-provisioned-throughput, and --worker-boot-disk-provisioned-throughput flags.

Example:
  gcloud dataproc clusters create CLUSTER_NAME \
      --region=REGION \
      --master-boot-disk-type=hyperdisk-balanced \
      --master-boot-disk-provisioned-iops=MASTER_BOOT_DISK_IOPS  \
      --master-boot-disk-provisioned-throughput=MASTER_BOOT_DISK_THROUGHPUT \
      --worker-boot-disk-type=hyperdisk-balanced \
      --worker-boot-disk-provisioned-iops=WORKER_BOOT_DISK_IOPS \
      --worker-boot-disk-provisioned-throughput=WORKER_BOOT_DISK_THROUGHPUT \
      other args ...
  

REST API

You can set provisioned IOPS and provisioned throughput for cluster nodes with hyperdisk boot disks using the InstanceGroupConfig.DiskConfig.bootDiskProvisionedIops and InstanceGroupConfig.DiskConfig.bootDiskProvisionedThroughput fields for the manager (master) and worker configs.