Managed Service for Apache Spark boot disks

You select standard, SSD, balanced Persistent Disk, or Google Cloud Hyperdisk balanced as boot disks for Managed Service for Apache Spark cluster nodes.

Boot disk type options:

  • You can select a standard, SSD, or balanced Persistent Disk, boot disk for master, primary worker, and secondary worker cluster nodes.

  • You can select hyperdisk balanced as the boot disk for master and primary worker nodes. Note that Managed Service for Apache Spark automatically sets the secondary worker boot disk type to hyperdisk balanced when the primary worker boot disk type is set to hyperdisk balanced.

The default persistent boot disk type for Managed Service for Apache Spark cluster master and primary worker nodes is standard (pd-standard). If the VM machine type supports only hyperdisk balanced as the boot disk, the default boot disk is hyperdisk balanced (hyperdisk-balanced). The default persistent boot disk type for cluster secondary worker nodes is the primary worker node persistent boot disk type.

Select persistent boot disk types for cluster nodes

You can select the persistent boot disk type when you create a cluster using the Google Cloud console, Google Cloud CLI, or Dataproc API.

Google Cloud console

To select cluster node boot disk types:

  1. Open the Managed Service for Apache Spark Create a cluster page.
  2. Click Additional configuration to expand that section.
  3. Click Primary workers and Secondary workers to confirm or change the default settings.
  4. By default, driver (master) node settings are the same as primary worker settings. Under Additional configuration, you can click Driver node to clear the Default driver node to the same as primary worker checkbox, then specify driver node settings.

gcloud CLI

You can create a cluster and select cluster node boot disk types using the gcloud dataproc clusters create command with the --master-boot-disk-type, --worker-boot-disk-type, and --secondary-worker-boot-disk-type flags.

Example:
gcloud dataproc clusters create CLUSTER_NAME \
    --region=REGION \
    --master-boot-disk-type=pd-ssd \
    --worker-boot-disk-type=hyperdisk-balanced \
    other args ...
p

REST API

You can set a value of pd-standard, pd-ssd, pd-balanced, or hyperdisk-balanced in the InstanceGroupConfig.DiskConfig.bootDiskType field in the masterConfig, workerConfig, and secondaryWorkerConfig as part of a cluster.create API request.

Hyperdisk settings

When you create a cluster with a hyperdisk balanced volume as the boot disk for a Managed Service for Apache Spark cluster node, you can set the provisioned IOPS and throughput.

Console

To configure provisioned IOPS and throughput:

  1. Open the Managed Service for Apache Spark Create a cluster page.
  2. Click Additional configuration to expand that section.
  3. Click Driver node or Primary workers.
  4. In the panel that opens, configure the provisioned IOPS and throughput settings, then click Save.

gcloud CLI

You can set provisioned IOPS and provisioned throughput for cluster nodes with the hyperdisk-balanced boot disks using the gcloud dataproc clusters create command --master-boot-disk-provisioned-iops, --worker-boot-disk-provisioned-iops, --master-boot-disk-provisioned-throughput, and --worker-boot-disk-provisioned-throughput flags.

Example:
  gcloud dataproc clusters create CLUSTER_NAME \
      --region=REGION \
      --master-boot-disk-type=hyperdisk-balanced \
      --master-boot-disk-provisioned-iops=MASTER_BOOT_DISK_IOPS  \
      --master-boot-disk-provisioned-throughput=MASTER_BOOT_DISK_THROUGHPUT \
      --worker-boot-disk-type=hyperdisk-balanced \
      --worker-boot-disk-provisioned-iops=WORKER_BOOT_DISK_IOPS \
      --worker-boot-disk-provisioned-throughput=WORKER_BOOT_DISK_THROUGHPUT \
      other args ...
  

REST API

You can set provisioned IOPS and provisioned throughput for cluster nodes with hyperdisk boot disks using the InstanceGroupConfig.DiskConfig.bootDiskProvisionedIops and InstanceGroupConfig.DiskConfig.bootDiskProvisionedThroughput fields for the master and worker configs.