"Managed Service for Apache Spark" is the new name for the product formerly known as "Dataproc on Compute Engine" (cluster deployment) and "Google Cloud Serverless for Apache Spark" (serverless deployment).

Create a Managed Service for Apache Spark zero-scale cluster

This document describes how to create a Managed Service for Apache Spark zero-scale cluster.

Managed Service for Apache Spark zero-scale clusters provide a cost-effective way to use Managed Service for Apache Spark clusters. Unlike standard Managed Service for Apache Spark clusters that require at least two primary workers, Managed Service for Apache Spark zero-scale clusters use only secondary workers that can be scaled down to zero.

Managed Service for Apache Spark zero-scale clusters are ideal for use as long-running clusters that experience idle periods, such as a cluster that hosts a Jupiter notebook. They provide improved resource utilization through the use of zero-scale autoscaling policies.

Characteristics and limitations

A Managed Service for Apache Spark zero-scale cluster shares similarities with a standard cluster, but has the following unique characteristics and limitations:

Requires image version 2.2.53 or later.
Supports only secondary workers, not primary workers.
Includes services such as YARN, but doesn't support the HDFS file system.
- To use Cloud Storage as the default file system, set the core:fs.defaultFS cluster property to a Cloud Storage bucket location (gs://BUCKET_NAME).
- If you disable a component during cluster creation, also disable HDFS.
Can't be converted to or from a standard cluster.
Requires an autoscaling policy for ZERO_SCALE cluster types.
Requires selecting flexible VMs as machine type.
Doesn't support the Oozie component.
Can't be created from the Google Cloud console.

Optional: Configure an autoscaling policy

You can configure an autoscaling policy to define secondary working scaling for a zero-scale cluster. When doing so, note the following:

Set the cluster type to ZERO_SCALE.
Configure an autoscaling policy to the secondary worker config only.

For more information, see Create an autoscaling policy.