You can install additional components like Solr when you create a Managed Service for Apache Spark cluster using the Optional components feature. This page describes the Solr component.
The Apache Solr
component is an open source enterprise search platform. The Solr server and
Web UI are available on port 8983 on the cluster's master node(s).
Persisting Solr files: By default, Solr writes and reads the index and
transaction log files in
HDFS.
To persist Solr files, use a Cloud Storage path as the Solr home
directory by setting the dataproc:solr.gcs.path
cluster property when you install the component.
Install the component
Install the component when you create a Managed Service for Apache Spark cluster. Components can be added to clusters created with Managed Service for Apache Spark version 1.3 and later.
See Supported Managed Service for Apache Spark versions for the component version included in each Managed Service for Apache Spark image release.
Google Cloud console
- In the Google Cloud console, open the Create cluster page.
- Click Additional configuration to expand that section.
- Edit Optional components.
- In the panel that opens, select the checkbox for Solr, then click Save.
gcloud CLI
To create a Managed Service for Apache Spark cluster that includes the Solr component,
use the
gcloud dataproc clusters create cluster-name
command with the --optional-components flag. The sample command below uses the optional properties
flag to set a Cloud Storage path as the Solr home directory.
gcloud dataproc clusters create cluster-name \ --region=region \ --optional-components=SOLR \ --enable-component-gateway \ ... other flags
--properties="dataproc:solr.gcs.path=gs://bucket-name/"
cluster property to the gcloud dataproc clusters create
command to set a Cloud Storage bucket where Solr documents will be stored
(Solr home directory).REST API
The Solr component can be specified through the Dataproc API using SoftwareConfig.Component as part of a clusters.create request.