"Managed Service for Apache Spark" is the new name for the product formerly known as "Dataproc on Compute Engine" (cluster deployment) and "Google Cloud Serverless for Apache Spark" (serverless deployment).

Cluster region

You specify a Compute Engine region, such as "us-east1" or "europe-west1", when you create a Managed Service for Apache Spark cluster. Managed Service for Apache Spark will isolate cluster resources, such as VM instances and Cloud Storage and metadata storage, within a zone within the specified region.

You can optionally specify a zone within the specified cluster region, such as "us-east1-a" or "europe-west1-b", when you create a cluster. If you don't specify the zone, Managed Service for Apache Spark Auto Zone Placement will choose a zone within your specified cluster region to locate clusters resources.

The regional namespace corresponds to the /regions/REGION segment of Managed Service for Apache Spark resource URIs (see, for example, the cluster networkUri).

Region names

Region names follow a standard naming convention based on Compute Engine regions. For example, the name for the Central US region is us-central1, and the name of the Western Europe region is europe-west1. Run the gcloud compute regions list command to see a listing of available regions.

Location and regional endpoints

Google Cloud APIs can provide support for locational and regional endpoints:

Locational endpoints ensure that in-transit data remains in the specified location when accessed through private connectivity.

Format: {location}-{service}.googleapis.com

Example: us-central-1-dataproc.googleapis.com
Regional endpoints ensure that in-transit data remains in the specified location when accessed through either private connectivity or the public internet.

Format: {service}.{location}.rep.googleapis.com

Example: dataproc.us-central1.rep.googleapis.com

The default Managed Service for Apache Spark endpoint is location endpoint. See the Managed Service for Apache Spark release notes for announcements on Managed Service for Apache Spark support of regional endpoints.

Create a cluster

gcloud CLI

When you create a cluster, specify a region using the required --region flag.

gcloud dataproc clusters create CLUSTER_NAME \
    --region=REGION \
    other args ...

REST API

Use the REGION URL parameter in a clusters.create request to specify the cluster region.

gRPC

Set the client transport address to the locational endpoint using the following pattern:

REGION-dataproc.googleapis.com

Python (google-cloud-python) example:

from google.cloud import dataproc_v1
from google.cloud.dataproc_v1.gapic.transports import cluster_controller_grpc_transport

transport = cluster_controller_grpc_transport.ClusterControllerGrpcTransport(
    address='us-central1-dataproc.googleapis.com:443')
client = dataproc_v1.ClusterControllerClient(transport)

project_id = 'my-project'
region = 'us-central1'
cluster = {...}

Java (google-cloud-java) example:

ClusterControllerSettings settings =
     ClusterControllerSettings.newBuilder()
        .setEndpoint("us-central1-dataproc.googleapis.com:443")
        .build();
 try (ClusterControllerClient clusterControllerClient = ClusterControllerClient.create(settings)) {
   String projectId = "my-project";
   String region = "us-central1";
   Cluster cluster = Cluster.newBuilder().build();
   Cluster response =
       clusterControllerClient.createClusterAsync(projectId, region, cluster).get();
 }

Console

Specify a Managed Service for Apache Spark region in the Location section of the Set up cluster panel on the Managed Service for Apache Spark Create a cluster page in the Google Cloud console.