Cluster region

You specify a Compute Engine region, such as "us-east1" or "europe-west1", when you create a Dataproc cluster. Dataproc will isolate cluster resources, such as VM instances and Cloud Storage and metadata storage, within a zone within the specified region.

You can optionally specify a zone within the specified cluster region, such as "us-east1-a" or "europe-west1-b", when you create a cluster. If you don't specify the zone, Dataproc Auto Zone Placement will choose a zone within your specified cluster region to locate clusters resources.

The regional namespace corresponds to the /regions/REGION segment of Dataproc resource URIs (see, for example, the cluster networkUri).

Region names

Region names follow a standard naming convention based on Compute Engine regions. For example, the name for the Central US region is us-central1, and the name of the Western Europe region is europe-west1. Run the gcloud compute regions list command to see a listing of available regions.

Location and regional endpoints

Google Cloud APIs can provide support for locational and regional endpoints:

  • Locational endpoints ensure that in-transit data remains in the specified location when accessed through private connectivity.

    Format: {location}-{service}.googleapis.com

    Example: us-central-1-dataproc.googleapis.com

  • Regional endpoints ensure that in-transit data remains in the specified location when accessed through either private connectivity or the public internet.

    Format: {service}.{location}.rep.googleapis.com

    Example: dataproc.us-central1.rep.googleapis.com

The default Dataproc endpoint is location endpoint. See the Dataproc release notes for announcements on Dataproc support of regional endpoints.

Create a cluster

gcloud CLI

When you create a cluster, specify a region using the required --region flag.

gcloud dataproc clusters create CLUSTER_NAME \
    --region=REGION \
    other args ...

REST API

Use the REGION URL parameter in a clusters.create request to specify the cluster region.

gRPC

Set the client transport address to the locational endpoint using the following pattern:

REGION-dataproc.googleapis.com

Python (google-cloud-python) example:

from google.cloud import dataproc_v1
from google.cloud.dataproc_v1.gapic.transports import cluster_controller_grpc_transport

transport = cluster_controller_grpc_transport.ClusterControllerGrpcTransport(
    address='us-central1-dataproc.googleapis.com:443')
client = dataproc_v1.ClusterControllerClient(transport)

project_id = 'my-project'
region = 'us-central1'
cluster = {...}

Java (google-cloud-java) example:

ClusterControllerSettings settings =
     ClusterControllerSettings.newBuilder()
        .setEndpoint("us-central1-dataproc.googleapis.com:443")
        .build();
 try (ClusterControllerClient clusterControllerClient = ClusterControllerClient.create(settings)) {
   String projectId = "my-project";
   String region = "us-central1";
   Cluster cluster = Cluster.newBuilder().build();
   Cluster response =
       clusterControllerClient.createClusterAsync(projectId, region, cluster).get();
 }

Console

Specify a Dataproc region in the Location section of the Set up cluster panel on the Dataproc Create a cluster page in the Google Cloud console.

What's next