本文說明如何建立 Dataplex Universal Catalog 湖泊。您可以在支援 Dataplex Universal Catalog 的任何區域中建立湖泊。
事前準備
- Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
- 
    
    
      In the Google Cloud console, on the project selector page, select or create a Google Cloud project. Roles required to select or create a project - Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
- 
      Create a project: To create a project, you need the Project Creator
      (roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.createpermission. Learn how to grant roles.
 
- 
  
    Verify that billing is enabled for your Google Cloud project. 
- 
  
  
    
      Enable the Dataplex, Dataproc, Dataproc Metastore, BigQuery, and Cloud Storage APIs. Roles required to enable APIs To enable APIs, you need the Service Usage Admin IAM role ( roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enablepermission. Learn how to grant roles.
- 
    
    
      In the Google Cloud console, on the project selector page, select or create a Google Cloud project. Roles required to select or create a project - Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
- 
      Create a project: To create a project, you need the Project Creator
      (roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.createpermission. Learn how to grant roles.
 
- 
  
    Verify that billing is enabled for your Google Cloud project. 
- 
  
  
    
      Enable the Dataplex, Dataproc, Dataproc Metastore, BigQuery, and Cloud Storage APIs. Roles required to enable APIs To enable APIs, you need the Service Usage Admin IAM role ( roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enablepermission. Learn how to grant roles.
- 如要建立及管理資料湖,請確認您已獲得預先定義的角色 - roles/dataplex.admin或- roles/dataplex.editor。詳情請參閱授予單一角色。
- 如要將其他專案的 Cloud Storage 值區附加至資料湖,請執行下列指令,將值區的管理員角色授予下列 Dataplex Universal Catalog 服務帳戶: - gcloud alpha dataplex lakes authorize \ --project PROJECT_ID_OF_LAKE \ --storage-bucket-resource BUCKET_NAME
- 設定 Dataproc Metastore 服務執行個體,公開 gRPC 端點 (而非預設的 Thrift Metastore 端點): - curl -X PATCH \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ "https://metastore.googleapis.com/v1beta/projects/PROJECT_ID/locations/LOCATION/services/SERVICE_ID?updateMask=hiveMetastoreConfig.endpointProtocol" \ -d '{"hiveMetastoreConfig": {"endpointProtocol": "GRPC"}}'
- 查看 gRPC 端點: - gcloud metastore services describe SERVICE_ID \ --project PROJECT_ID \ --location LOCATION \ --format "value(endpointUri)"
- 在 Google Cloud 控制台,前往 Dataplex Universal Catalog 的「Lakes」(湖泊) 頁面。 
- 按一下「建立」。 
- 輸入顯示名稱。 
- 系統會自動產生湖泊 ID,如有需要,您也可以提供自己的 ID。請參閱「資源命名慣例」。 
- 選用:輸入說明。 
- 指定要建立資料湖泊的「Region」(區域)。 - 如果是在特定區域 (例如 - us-central1) 建立的湖泊,可以根據可用區設定來附加單一區域 (- us-central1) 資料和多區域 (- us multi-region) 資料。
- 選用:為資料湖泊加上標籤。 
- 選用:在「Metastore」(中繼存放區) 專區中,按一下「Metastore service」(中繼存放區服務) 選單,然後選取您在「事前準備」一節中建立的服務。 
- 點選「建立」。 
- LAKE:新湖泊的名稱
- LOCATION:是指 Google Cloud 區域
- k1=v1,k2=v2,k3=v3:使用的標籤 (如有)
- METASTORE_SERVICE:Dataproc Metastore 服務 (如已建立)
存取權控管
建立 Metastore
您可以將 Dataproc Metastore 服務執行個體與 Dataplex Universal Catalog 湖泊建立關聯,在 Spark 查詢中使用 Hive Metastore 存取 Dataplex Universal Catalog 中繼資料。您必須擁有與 Dataplex Universal Catalog 湖泊相關聯的 gRPC 啟用型 Dataproc Metastore (3.1.2 以上版本)。
建立湖泊
控制台
gcloud
如要建立湖泊,請使用 gcloud alpha dataplex lakes create 指令:
gcloud alpha dataplex lakes create LAKE \ --location=LOCATION \ --labels=k1=v1,k2=v2,k3=v3 \ --metastore-service=METASTORE_SERVICE
更改下列內容:
REST
如要建立湖泊,請使用 lakes.create 方法。