"Managed Service for Apache Spark" is the new name for the product formerly known as "Dataproc on Compute Engine" (cluster deployment) and "Google Cloud Serverless for Apache Spark" (serverless deployment).

Managed Service for Apache Spark optional Trino component

You can install additional components like Trino when you create a Managed Service for Apache Spark cluster using the Optional components feature. This page describes how you can optionally install the Trino component on a Managed Service for Apache Spark cluster.

Trino is an open source distributed SQL query engine. The Trino server and Web UI are by default available on port 8060 (or port 7778 if Kerberos is enabled) on the cluster's first master node.

By default, Trino on Managed Service for Apache Spark is configured to work with Hive, BigQuery, Memory, TPCH and TPCDS connectors.

After creating a cluster with the Trino component, you can run queries:

from a local terminal with the gcloud dataproc jobs submit trino command
from a terminal window on the cluster's first master node using the trino CLI (Command Line Interface)—see Use Trino with Managed Service for Apache Spark.

Install the component

Install the component when you create a Managed Service for Apache Spark cluster.

See Supported Managed Service for Apache Spark versions for the component version included in each Managed Service for Apache Spark image release.

Console

In the Google Cloud console, go to the Managed Service for Apache Spark Create a cluster page.
Go to Create a cluster

The Set up cluster panel is selected.
In the Components section:
- In Optional components, select Trino and other optional components to install on your cluster.
- Under Component Gateway, select Enable component gateway (see Viewing and Accessing Component Gateway URLs).

gcloud CLI

To create a Managed Service for Apache Spark cluster that includes the Trino component, use the gcloud dataproc clusters create command with the --optional-components flag.

gcloud dataproc clusters create CLUSTER_NAME \
    --optional-components=TRINO \
    --region=region \
    --enable-component-gateway \
    ... other flags

Notes:

CLUSTER_NAME: The name of the cluster.
REGION: A Compute Engine region where the cluster will be located.

Configuring properties

Add the --properties flag to the gcloud dataproc clusters create command to set trino, trino-jvm and trino-catalog config properties.

Application properties: Use cluster properties with the trino: prefix to configure Trino application properties—for example, --properties="trino:join-distribution-type=AUTOMATIC".
JVM configuration properties: Use cluster properties with the trino-jvm: prefix to configure JVM properties for Trino coordinator and worker Java processes—for example, --properties="trino-jvm:XX:+HeapDumpOnOutOfMemoryError".
Creating new catalogs and adding catalog properties: Use trino-catalog:catalog-name.property-name to configure Trino catalogs.
Example: The following `properties` flag can be used with the `gcloud dataproc clusters create` command to create a Trino cluster with a "prodhive" Hive catalog. A prodhive.properties file will be created under/usr/lib/trino/etc/catalog/ to enable the prodhive catalog.
```
--properties="trino-catalog:prodhive.connector.name=hive,trino-catalog:prodhive.hive.metastore.uri=thrift://localhost:9000"
```

REST API

The Trino component can be specified through the Managed Service for Apache Spark API using SoftwareConfig.Component as part of a clusters.create request.

Using the Managed Service for Apache Spark v1 API, set the EndpointConfig.enableHttpPortAccess property to true as part of the clusters.create request to enable connecting to the Trino Web UI using the Component Gateway.

Managed Service for Apache Spark optional Trino component Stay organized with collections Save and categorize content based on your preferences.

Install the component

Console

gcloud CLI

Configuring properties

REST API

Managed Service for Apache Spark optional Trino component