"Managed Service for Apache Spark" is the new name for the product formerly known as "Dataproc on Compute Engine" (cluster deployment) and "Google Cloud Serverless for Apache Spark" (serverless deployment).

Attach GPUs to clusters

Managed Service for Apache Spark provides the ability for graphics processing units (GPUs) to be attached to the master and worker Compute Engine nodes in a Managed Service for Apache Spark cluster. You can use these GPUs to accelerate specific workloads on your instances, such as machine learning and data processing.

For more information about what you can do with GPUs and what types of GPU hardware are available, read GPUs on Compute Engine.

Before you begin

GPUs require special drivers and software. These items are pre-installed in Managed Service for Apache Spark -ml images (using the -ml images is recommended), and can be manually installed when and if needed.
Read about GPU pricing on Compute Engine to understand the cost to use GPUs in your instances.
Read about restrictions for instances with GPUs to learn how these instances function differently from non-GPU instances.
Check the quotas page for your project to ensure that you have sufficient GPU quota (NVIDIA_T4_GPUS, NVIDIA_P100_GPUS, or NVIDIA_V100_GPUS) available in your project. If GPUs are not listed on the quotas page or you require additional GPU quota, request a quota increase.

Types of GPUs

Managed Service for Apache Spark nodes support the following GPU types. You must specify GPU type when attaching GPUs to your Managed Service for Apache Spark cluster.

nvidia-tesla-l4 - NVIDIA® Tesla® L4
nvidia-tesla-a100 - NVIDIA® Tesla® A100
nvidia-tesla-p100 - NVIDIA® Tesla® P100
nvidia-tesla-v100 - NVIDIA® Tesla® V100
nvidia-tesla-p4 - NVIDIA® Tesla® P4
nvidia-tesla-t4 - NVIDIA® Tesla® T4
nvidia-rtx-pro-6000 - NVIDIA® RTX® PRO 6000
nvidia-tesla-p100-vws - NVIDIA® Tesla® P100 Virtual Workstations
nvidia-tesla-p4-vws - NVIDIA® Tesla® P4 Virtual Workstations
nvidia-tesla-t4-vws - NVIDIA® Tesla® T4 Virtual Workstations

Attach GPUs to a cluster

To attach GPUs to a Managed Service for Apache Spark cluster, when you create the cluster you must either specify a -ml image (recommended) or use an initialization action to install GPU drivers. The following examples specify or assume the use of the 2.3-ml-ubuntu image when creating a cluster.

Google Cloud console

To attach GPUs to a Managed Service for Apache Spark cluster, perform the following steps:

Open the Create cluster page.
In the Worker configuration section, click the GPUs tab, and then specify the type and number of GPUs.
By default, driver (master) node settings are the same as primary worker settings. Under Additional configuration, you can click Driver node to clear the Default driver node to the same as primary worker checkbox and specify the machine type for the driver node.

gcloud CLI

To attach GPUs to the master and primary and secondary worker nodes in a Managed Service for Apache Spark cluster, create the cluster using the gcloud dataproc clusters create ‑‑master-accelerator, ‑‑worker-accelerator, and ‑‑secondary-worker-accelerator flags. These flags take the following values:

The type of GPU to attach to a node
The number of GPUs to attach to the node

The type of GPU is required and the number of GPUs is optional (the default is 1 GPU).

Example:

gcloud dataproc clusters create cluster-name \
    --image-version=2.3-ml-ubuntu \
    --region=region \
    --master-accelerator type=nvidia-tesla-t4 \
    --worker-accelerator type=nvidia-tesla-t4,count=4 \
    --secondary-worker-accelerator type=nvidia-tesla-t4,count=4 \
    ... other flags

REST API

To attach GPUs to the master and primary and secondary worker nodes in a Managed Service for Apache Spark cluster, fill in the InstanceGroupConfig.AcceleratorConfig acceleratorTypeUri and acceleratorCount fields as part of the cluster.create API request. These fields take the following values:

The type of GPU to attach to a node
The number of GPUs to attach to the node

Install GPU drivers

GPU drivers are required to utilize GPUs attached to Managed Service for Apache Spark nodes. As an alternative to using the GPU drivers installed in the Managed Service for Apache Spark -ml images, you can use the following initialization actions to install GPU drivers when you create a cluster:

Verify GPU driver installation

You can verify GPU driver installation on a cluster by connecting using SSH to the cluster master node, and then running the following command:

nvidia-smi

If the driver is functioning properly, the output will display the driver version and GPU statistics (see Verifying the GPU driver install).

Spark configuration

When you submit a job to Spark, you can use the spark.executorEnv Spark configuration runtime environment property property with the LD_PRELOAD environment variable to preload needed libraries.

Example:

gcloud dataproc jobs submit spark --cluster=CLUSTER_NAME \
  --region=REGION \
  --class=org.apache.spark.examples.SparkPi \
  --jars=file:///usr/lib/spark/examples/jars/spark-examples.jar \
  --properties=spark.executorEnv.LD_PRELOAD=libnvblas.so,spark.task.resource.gpu.amount=1,spark.executor.resource.gpu.amount=1,spark.executor.resource.gpu.discoveryScript=/usr/lib/spark/scripts/gpu/getGpusResources.sh

Example GPU job

You can test GPUs on Managed Service for Apache Spark by running any of the following jobs, which benefit when run with GPUs:

Run one of the Spark ML examples.
Run the following example with spark-shell to run a matrix computation:

import org.apache.spark.mllib.linalg._
import org.apache.spark.mllib.linalg.distributed._
import java.util.Random

def makeRandomSquareBlockMatrix(rowsPerBlock: Int, nBlocks: Int): BlockMatrix = {
  val range = sc.parallelize(1 to nBlocks)
  val indices = range.cartesian(range)
  return new BlockMatrix(
      indices.map(
          ij => (ij, Matrices.rand(rowsPerBlock, rowsPerBlock, new Random()))),
      rowsPerBlock, rowsPerBlock, 0, 0)
}

val N = 1024 * 4
val n = 2
val mat1 = makeRandomSquareBlockMatrix(N, n)
val mat2 = makeRandomSquareBlockMatrix(N, n)
val mat3 = mat1.multiply(mat2)
mat3.blocks.persist.count
println("Processing complete!")

What's Next

Learn how to create a Compute Engine instance with attached GPUs.
Learn more about GPU machine types.