This document describes how to create a Dataplex Universal Catalog lake. You can create a lake in any of the regions that support Dataplex Universal Catalog.
Before you begin
- Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Roles required to select or create a project
- Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
-
Create a project: To create a project, you need the Project Creator role
(
roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.createpermission. Learn how to grant roles.
-
Verify that billing is enabled for your Google Cloud project.
Enable the Dataplex, Dataproc, Dataproc Metastore, BigQuery, and Cloud Storage APIs.
Roles required to enable APIs
To enable APIs, you need the Service Usage Admin IAM role (
roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enablepermission. Learn how to grant roles.-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Roles required to select or create a project
- Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
-
Create a project: To create a project, you need the Project Creator role
(
roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.createpermission. Learn how to grant roles.
-
Verify that billing is enabled for your Google Cloud project.
Enable the Dataplex, Dataproc, Dataproc Metastore, BigQuery, and Cloud Storage APIs.
Roles required to enable APIs
To enable APIs, you need the Service Usage Admin IAM role (
roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enablepermission. Learn how to grant roles.
Access control
To create and manage your lake, make sure you have the predefined roles
roles/dataplex.adminorroles/dataplex.editorgranted. For more information, see grant a single role.To attach a Cloud Storage bucket from another project to your lake, grant the following Dataplex Universal Catalog service account an administrator role on the bucket by running the following command:
gcloud dataplex lakes authorize \ --project PROJECT_ID_OF_LAKE \ --storage-bucket-resource BUCKET_NAME
Create a metastore
You can access Dataplex Universal Catalog metadata using Hive Metastore in Spark queries by associating a Dataproc Metastore service instance with your Dataplex Universal Catalog lake. You need to have a gRPC-enabled Dataproc Metastore (version 3.1.2 or higher) associated with the Dataplex Universal Catalog lake.
Create a Dataproc Metastore service.
Configure the Dataproc Metastore service instance to expose a gRPC endpoint (instead of the default Thrift Metastore endpoint):
curl -X PATCH \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ "https://metastore.googleapis.com/v1beta/projects/PROJECT_ID/locations/LOCATION/services/SERVICE_ID?updateMask=hiveMetastoreConfig.endpointProtocol" \ -d '{"hiveMetastoreConfig": {"endpointProtocol": "GRPC"}}'View the gRPC endpoint:
gcloud metastore services describe SERVICE_ID \ --project PROJECT_ID \ --location LOCATION \ --format "value(endpointUri)"
Create a lake
Console
In the Google Cloud console, go to the Dataplex Universal Catalog Lakes page.
Click Create.
Enter a Display name.
The lake ID is automatically generated for you. If you prefer, you can provide your own ID. See Resource naming convention.
Optional: Enter a Description.
Specify the Region in which to create the lake.
For lakes created in a given region (for example,
us-central1), you can attach both single-region (us-central1) data and multi-region (us multi-region) data depending on the zone settings.Optional: Add labels to your lake.
Optional: In the Metastore section, click the Metastore service menu, and select the service you created in the Before you begin section.
Click Create.
gcloud
To create a lake, use the gcloud dataplex lakes create command:
gcloud dataplex lakes create LAKE \ --location=LOCATION \ --labels=k1=v1,k2=v2,k3=v3 \ --metastore-service=METASTORE_SERVICE
Replace the following:
LAKE: name of the new lakeLOCATION: refers to a Google Cloud regionk1=v1,k2=v2,k3=v3: labels used (if any)METASTORE_SERVICE: the Dataproc Metastore service, if created
REST
To create a lake, use the lakes.create method.
What's next
- Learn how to Add zones to a lake.
- Learn how to Attach assets to a zone.
- Learn how to secure your lake.
- Learn how to manage your lake.