You can install additional components like Apache Pig when you create a Managed Service for Apache Spark cluster using the Optional components feature. This page describes the Pig component, an open source platform for analyzing large data sets.
Install the component
Install the component when you create a Managed Service for Apache Spark cluster.
Apache Pig is an optional component in Managed Service for Apache Spark 2.3 and later
image versions.
See Supported Managed Service for Apache Spark versions for component versions included in the latest Managed Service for Apache Spark image releases.
gcloud
To create a Managed Service for Apache Spark cluster that includes the Pig component,
use the
gcloud dataproc clusters create CLUSTER_NAME
command with the --optional-components flag (using image version
2.3 or later).
gcloud dataproc clusters create CLUSTER_NAME \ --region=REGION \ --optional-components=PIG \ --image-version=2.3 \ ... other flags
REST API
The Pig component can be specified through the Managed Service for Apache Spark API using SoftwareConfig.Component as part of a clusters.create request.
Console
Enable the component:
- In the Google Cloud console, open the Managed Service for Apache Spark Create a cluster page. The Set up cluster panel is selected.
- In the Components section, under Optional components, select Pig and other optional components to install on your cluster.