Managed Service for Apache Spark optional Pig component

You can install additional components like Apache Pig when you create a Managed Service for Apache Spark cluster using the Optional components feature. This page describes the Pig component, an open source platform for analyzing large data sets.

Install the component

Install the component when you create a Managed Service for Apache Spark cluster.

Apache Pig is an optional component in Managed Service for Apache Spark 2.3 and later image versions.

See Supported Managed Service for Apache Spark versions for component versions included in the latest Managed Service for Apache Spark image releases.

Google Cloud console

  1. In the Google Cloud console, open the Create cluster page.
  2. Click Additional configuration to expand that section.
  3. Edit Optional components.
  4. In the panel that opens, select the checkbox for Pig, then click Save.

gcloud CLI

To create a Managed Service for Apache Spark cluster that includes the Pig component, use the gcloud dataproc clusters create CLUSTER_NAME command with the --optional-components flag (using image version 2.3 or later).

gcloud dataproc clusters create CLUSTER_NAME \
    --region=REGION \
    --optional-components=PIG \
    --image-version=2.3 \
    ... other flags

REST API

The Pig component can be specified through the Dataproc API using SoftwareConfig.Component as part of a clusters.create request.