You select standard, SSD, balanced Persistent Disk, or Google Cloud Hyperdisk balanced as boot disks for Managed Service for Apache Spark cluster nodes.
Boot disk type options:
You can select a standard, SSD, or balanced Persistent Disk, boot disk for master, primary worker, and secondary worker cluster nodes.
You can select hyperdisk balanced as the boot disk for master and primary worker nodes. Note that Managed Service for Apache Spark automatically sets the secondary worker boot disk type to hyperdisk balanced when the primary worker boot disk type is set to hyperdisk balanced.
The default persistent boot disk type for Managed Service for Apache Spark cluster
master and primary worker nodes is standard (pd-standard). If the VM
machine type
supports only hyperdisk balanced
as the boot disk, the default boot disk is hyperdisk balanced (hyperdisk-balanced).
The default persistent boot disk type for cluster secondary worker nodes is the
primary worker node persistent boot disk type.
Select persistent boot disk types for cluster nodes
You can select the persistent boot disk type when you create a cluster using the Google Cloud console, Google Cloud CLI, or Dataproc API.
Google Cloud console
To select cluster node boot disk types:
- Open the Managed Service for Apache Spark Create a cluster page.
- Click Additional configuration to expand that section.
- Click Primary workers and Secondary workers to confirm or change the default settings.
- By default, driver (master) node settings are the same as primary worker settings. Under Additional configuration, you can click Driver node to clear the Default driver node to the same as primary worker checkbox, then specify driver node settings.
gcloud CLI
You can create a cluster and select cluster node boot disk types using the
gcloud dataproc clusters create
command with the --master-boot-disk-type,
--worker-boot-disk-type, and
--secondary-worker-boot-disk-type flags.
gcloud dataproc clusters create CLUSTER_NAME \ --region=REGION \ --master-boot-disk-type=pd-ssd \ --worker-boot-disk-type=hyperdisk-balanced \ other args ...
REST API
You can set a value of pd-standard, pd-ssd,
pd-balanced, or hyperdisk-balanced in the
InstanceGroupConfig.DiskConfig.bootDiskType
field in the masterConfig, workerConfig, and
secondaryWorkerConfig as part of a
cluster.create
API request.
Hyperdisk settings
When you create a cluster with a hyperdisk balanced volume as the boot disk for a Managed Service for Apache Spark cluster node, you can set the provisioned IOPS and throughput.
Console
To configure provisioned IOPS and throughput:
- Open the Managed Service for Apache Spark Create a cluster page.
- Click Additional configuration to expand that section.
- Click Driver node or Primary workers.
- In the panel that opens, configure the provisioned IOPS and throughput settings, then click Save.
gcloud CLI
You can set provisioned IOPS and provisioned throughput for cluster nodes with the
hyperdisk-balanced boot disks using the
gcloud dataproc clusters create
command --master-boot-disk-provisioned-iops,
--worker-boot-disk-provisioned-iops,
--master-boot-disk-provisioned-throughput, and
--worker-boot-disk-provisioned-throughput flags.
gcloud dataproc clusters create CLUSTER_NAME \ --region=REGION \ --master-boot-disk-type=hyperdisk-balanced \ --master-boot-disk-provisioned-iops=MASTER_BOOT_DISK_IOPS \ --master-boot-disk-provisioned-throughput=MASTER_BOOT_DISK_THROUGHPUT \ --worker-boot-disk-type=hyperdisk-balanced \ --worker-boot-disk-provisioned-iops=WORKER_BOOT_DISK_IOPS \ --worker-boot-disk-provisioned-throughput=WORKER_BOOT_DISK_THROUGHPUT \ other args ...
REST API
You can set provisioned IOPS and provisioned throughput for cluster nodes with
hyperdisk boot disks using the InstanceGroupConfig.DiskConfig.bootDiskProvisionedIops and InstanceGroupConfig.DiskConfig.bootDiskProvisionedThroughput fields for
the master and worker configs.