Managed Service for Apache Spark optional Presto component

You can install additional components like Presto when you create a Managed Service for Apache Spark cluster using the Optional components feature. This page describes how you can optionally install Presto component on a Managed Service for Apache Spark cluster.

Presto (Trino) is an open source distributed SQL query engine. The Presto server and Web UI are by default available on port 8060 (or port 7778 if Kerberos is enabled) on the cluster's first master node.

By default, Presto on Managed Service for Apache Spark is configured to work with Hive, BigQuery, Memory, TPCH and TPCDS connectors.

After creating a cluster with the Presto component, you can run queries:

Install the component

Install the component when you create a Managed Service for Apache Spark cluster. Components can be added to clusters created with Managed Service for Apache Spark version 1.3 and later.

See Supported Managed Service for Apache Spark versions for the component version included in each Managed Service for Apache Spark image release.

Google Cloud CLI command

To create a Managed Service for Apache Spark cluster that includes the Presto component, use the gcloud dataproc clusters create cluster-name command with the --optional-components flag.

gcloud dataproc clusters create cluster-name \
    --optional-components=PRESTO \
    --region=region \
    --enable-component-gateway \
    ... other flags

Configuring properties

Add the --properties flag to the gcloud dataproc clusters create command to set presto, presto-jvm and presto-catalog config properties.

  • Application properties: Use cluster properties with the presto: prefix to configure Presto application properties—for example, --properties="presto:join-distribution-type=AUTOMATIC".
  • JVM configuration properties: Use cluster properties with the presto-jvm: prefix to configure JVM properties for Presto coordinator and worker Java processes—for example, --properties="presto-jvm:XX:+HeapDumpOnOutOfMemoryError".
  • Creating new catalogs and adding catalog properties: Use presto-catalog:catalog-name.property-name to configure Presto catalogs.

    Example: The following `properties` flag can be used with the `gcloud dataproc clusters create` command to create a Presto cluster with a "prodhive" Hive catalog. A prodhive.properties file will be created under/usr/lib/presto/etc/catalog/ to enable the prodhive catalog.

    --properties="presto-catalog:prodhive.connector.name=hive-hadoop2,presto-catalog:prodhive.hive.metastore.uri=thrift://localhost:9083

REST API

The Presto component can be specified through the Managed Service for Apache Spark API using SoftwareConfig.Component as part of a clusters.create request.

Console

    1. Enable the component and component gateway.
      • In the Google Cloud console, open the Managed Service for Apache Spark Create a cluster page. The Set up cluster panel is selected.
      • In the Components section: