You select standard, SSD, balanced Persistent Disk, or Google Cloud Hyperdisk balanced as boot disks for Dataproc cluster nodes.
Boot disk type options:
You can select a standard, SSD, or balanced Persistent Disk, boot disk for manager (master), primary worker, and secondary worker cluster nodes.
You can select hyperdisk balanced as the boot disk for manager (master) and primary worker nodes. Note that Dataproc automatically sets the secondary worker boot disk type to hyperdisk balanced when the primary worker boot disk type is set to hyperdisk balanced.
The default persistent boot disk type for Dataproc cluster manager
(master) and primary worker nodes is standard (pd-standard). If the VM
machine type
supports only hyperdisk balanced
as the boot disk, the default boot disk is hyperdisk balanced (hyperdisk-balanced).
The default persistent boot disk type for cluster secondary worker nodes is the
primary worker node persistent boot disk type.
Select persistent boot disk types for cluster nodes
You can select the persistent boot disk type when you create a cluster using the Google Cloud console, Google Cloud CLI, or Dataproc API.
Console
You can create a cluster and select cluster node boot disk types from the Configure nodes panel on the Dataproc Create a cluster page of the Google Cloud console.
gcloud CLI
You can create a cluster and select cluster node boot disk types using the
gcloud dataproc clusters create
command with the --master-boot-disk-type,
--worker-boot-disk-type, and
--secondary-worker-boot-disk-type flags.
gcloud dataproc clusters create CLUSTER_NAME \ --region=REGION \ --master-boot-disk-type=pd-ssd \ --worker-boot-disk-type=hyperdisk-balanced \ other args ...
REST API
You can set a value of pd-standard, pd-ssd,
pd-balanced, or hyperdisk-balanced in the
InstanceGroupConfig.DiskConfig.bootDiskType
field in the masterConfig, workerConfig, and
secondaryWorkerConfig as part of a
cluster.create
API request.
Hyperdisk settings
When you create a cluster with a hyperdisk balanced volume as the boot disk for a Dataproc cluster node, you can set the provisioned IOPS and throughput.
Console
You can set IOPS and throughput, or accept the default values from the Configure nodes panel on the Dataproc Create a cluster page.
gcloud CLI
You can set provisioned IOPS and provisioned throughput for cluster nodes with the
hyperdisk-balanced boot disks using the
gcloud dataproc clusters create
command --master-boot-disk-provisioned-iops,
--worker-boot-disk-provisioned-iops,
--master-boot-disk-provisioned-throughput, and
--worker-boot-disk-provisioned-throughput flags.
gcloud dataproc clusters create CLUSTER_NAME \ --region=REGION \ --master-boot-disk-type=hyperdisk-balanced \ --master-boot-disk-provisioned-iops=MASTER_BOOT_DISK_IOPS \ --master-boot-disk-provisioned-throughput=MASTER_BOOT_DISK_THROUGHPUT \ --worker-boot-disk-type=hyperdisk-balanced \ --worker-boot-disk-provisioned-iops=WORKER_BOOT_DISK_IOPS \ --worker-boot-disk-provisioned-throughput=WORKER_BOOT_DISK_THROUGHPUT \ other args ...
REST API
You can set provisioned IOPS and provisioned throughput for cluster nodes with
hyperdisk boot disks using the InstanceGroupConfig.DiskConfig.bootDiskProvisionedIops and InstanceGroupConfig.DiskConfig.bootDiskProvisionedThroughput fields for
the manager (master) and worker configs.