Membuat data lake Universal Catalog Dataplex

Dokumen ini menjelaskan cara membuat data lake Dataplex Universal Catalog. Anda dapat membuat data lake di region mana pun yang mendukung Dataplex Universal Catalog.

Sebelum memulai

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Roles required to select or create a project

    • Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
    • Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

    Go to project selector

  3. Verify that billing is enabled for your Google Cloud project.

  4. Enable the Dataplex, Dataproc, Dataproc Metastore, BigQuery, and Cloud Storage APIs.

    Roles required to enable APIs

    To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.

    Enable the APIs

  5. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Roles required to select or create a project

    • Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
    • Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

    Go to project selector

  6. Verify that billing is enabled for your Google Cloud project.

  7. Enable the Dataplex, Dataproc, Dataproc Metastore, BigQuery, and Cloud Storage APIs.

    Roles required to enable APIs

    To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.

    Enable the APIs

  8. Kontrol akses

    1. Untuk membuat dan mengelola lake Anda, pastikan Anda telah diberi peran standar roles/dataplex.admin atau roles/dataplex.editor. Untuk mengetahui informasi selengkapnya, lihat bagian memberikan satu peran.

    2. Untuk melampirkan bucket Cloud Storage dari project lain ke lake Anda, berikan peran administrator pada bucket kepada akun layanan Dataplex Universal Catalog berikut dengan menjalankan perintah berikut:

      gcloud dataplex lakes authorize \
      --project PROJECT_ID_OF_LAKE \
      --storage-bucket-resource BUCKET_NAME
      

    Membuat metastore

    Anda dapat mengakses metadata Dataplex Universal Catalog menggunakan Hive Metastore dalam kueri Spark dengan mengaitkan instance layanan Dataproc Metastore dengan lake Dataplex Universal Catalog Anda. Anda harus memiliki Dataproc Metastore yang mendukung gRPC (versi 3.1.2 atau yang lebih tinggi) yang terkait dengan lake Katalog Universal Dataplex.

    1. Buat layanan Dataproc Metastore.

    2. Konfigurasi instance layanan Dataproc Metastore untuk mengekspos endpoint gRPC (bukan endpoint Thrift Metastore default):

      curl -X PATCH \
      -H "Authorization: Bearer $(gcloud auth print-access-token)" \
      -H "Content-Type: application/json" \
      "https://metastore.googleapis.com/v1beta/projects/PROJECT_ID/locations/LOCATION/services/SERVICE_ID?updateMask=hiveMetastoreConfig.endpointProtocol" \
      -d '{"hiveMetastoreConfig": {"endpointProtocol": "GRPC"}}'
      
    3. Lihat endpoint gRPC:

      gcloud metastore services describe SERVICE_ID \
        --project PROJECT_ID \
        --location LOCATION \
        --format "value(endpointUri)"
      

    Membuat data lake

    Konsol

    1. Di konsol Google Cloud , buka halaman Lakes Dataplex Universal Catalog.

      Buka Lakes

    2. Klik Create .

    3. Masukkan Nama tampilan.

    4. ID lake dibuat secara otomatis untuk Anda. Jika mau, Anda dapat memberikan tanda pengenal Anda sendiri. Lihat Konvensi penamaan resource.

    5. Opsional: Masukkan Deskripsi..

    6. Tentukan Region tempat pembuatan lake.

      Untuk lake yang dibuat di region tertentu (misalnya, us-central1), Anda dapat melampirkan data satu region (us-central1) dan data multi-region (us multi-region) bergantung pada setelan zona.

    7. Opsional: Tambahkan label ke lake Anda.

    8. Opsional: Di bagian Metastore, klik menu Metastore service, lalu pilih layanan yang Anda buat di bagian Sebelum memulai.

    9. Klik Create.

    gcloud

    Untuk membuat lake, gunakan perintah gcloud dataplex lakes create:

    gcloud dataplex lakes create LAKE \
     --location=LOCATION \
     --labels=k1=v1,k2=v2,k3=v3 \
     --metastore-service=METASTORE_SERVICE
    

    Ganti kode berikut:

    • LAKE: nama lake baru
    • LOCATION: merujuk ke Google Cloud region
    • k1=v1,k2=v2,k3=v3: label yang digunakan (jika ada)
    • METASTORE_SERVICE: layanan Dataproc Metastore, jika dibuat

    REST

    Untuk membuat lake, gunakan metode lakes.create.

    Apa langkah selanjutnya?