Managed Service for Apache Spark optional Docker component

To run Docker containers on your Managed Service for Apache Spark cluster nodes, enable the Docker optional component during cluster creation. This document explains how to install and configure the Docker component on Managed Service for Apache Spark.

To learn more about other available optional components in Managed Service for Apache Spark, see Available optional components.

How the Docker component works

When you enable the Managed Service for Apache Spark Docker component, it installs a Docker daemon on each cluster node. It also sets up a Linux user and group, both named "docker", on each node to run the Docker daemon. Additionally, the component creates "docker" systemd service to run the dockerd service. You should use this systemd service to manage the lifecycle of the Docker service.

Install the component

Install the component when you create a Managed Service for Apache Spark cluster. The Docker component can be installed on clusters created with Managed Service for Apache Spark image version 1.5 or later.

See Supported Managed Service for Apache Spark versions for the component version included in each Managed Service for Apache Spark image release.

gcloud command

To create a Managed Service for Apache Spark cluster that includes the Docker component, use the gcloud dataproc clusters create cluster-name command with the --optional-components flag.

gcloud dataproc clusters create cluster-name \
    --optional-components=DOCKER \
    --region=region \
    --image-version=1.5 \
    ... other flags

REST API

The Docker component can be specified through the Managed Service for Apache Spark API using SoftwareConfig.Component as part of a clusters.create request.

Console

  1. Enable the component.
    • In the Google Cloud console, open the Managed Service for Apache Spark Create a cluster page. The Set up cluster panel is selected.
    • In the Components section:
      • Under Optional components, select Docker and other optional components to install on your cluster.

Enable Docker on YARN

See Customize your Spark job runtime environment with Docker on YARN to use a customized Docker image with YARN.

Docker Logging

By default, the Managed Service for Apache Spark Docker component writes logs to Cloud Logging by setting the gcplogs driver—see Viewing your logs.

Docker Registry

The Managed Service for Apache Spark Docker component configures Docker to use Container Registry in addition to the default Docker registries. Docker will use the Docker credential helper to authenticate with Container Registry.

Use the Docker component on a Kerberos cluster

You can install the Docker optional component on a cluster that is being created with Kerberos security enabled.