Build a data mesh

You can use Dataplex Universal Catalog to build a data mesh architecture. This quickstart shows you how to use Dataplex Universal Catalog features, such as a lake, zones, and assets, to build a data mesh.

A data mesh is an organizational and technical approach that decentralizes data ownership among domain data owners. These owners provide the data as a product in a standard way and facilitate communication among different parts of the organization to distribute datasets across different locations. Learn more about data mesh architectures.

Create a domain

  1. In the Google Cloud console, go to the Dataplex Universal Catalog Lakes page.

    Go to Lakes

  2. Click Create to create a new lake, which acts as your data mesh.

  3. In the Display name field, enter My data mesh.

  4. For Region, select us-central1.

  5. Select the Dataproc Metastore service that you created and configured earlier as the associated metastore.

  6. Click Create.

Create zones in your lake

After creating a domain by creating a Dataplex Universal Catalog lake, you can host managed data contracts and individual teams within the domain by using zones. There are two types of zones:

  • Raw zones are typically used to store data in any format from external sources in Cloud Storage. Raw zones are useful for data that requires further processing before it's ready for consumption.

  • Curated zones are used for structured data in Cloud Storage that must conform to certain file formats, and are organized in a hive-compatible directory layout. They are most useful for data that's ready for consumption and analysis.

Each domain (for example, sales, customers, products) should have at least a raw zone and a curated zone.

Additional zones are used to manage data contracts between teams or to provide a more granular breakdown for teams within a given domain. For example, inventory management within the product domain. Data owners are able to manage the data within their domain and access it.

  1. In the Google Cloud console, navigate to the Dataplex Universal Catalog Manage view.

  2. Click the name of the lake (My data mesh) you want to add a zone to.

  3. In the Zones tab, click Add Zone.

  4. In the Display name field, enter My sub domain. Dataplex Universal Catalog automatically generates an ID for your zone.

  5. For Type, select Raw zone.

  6. Click Create.

Attach assets to your zones

Attach data assets to your zone. A data asset, the storage resources that contain your data, can be a Cloud Storage bucket or a BigQuery dataset. This is the final step in creating your data mesh architecture.

  1. In the Dataplex Universal Catalog Manage view, click the lake you created (My data mesh).

  2. In the Zones tab, click the zone (My sub domain) to add the asset to.

  3. In the Assets tab, click Add assets

  4. Click Add an Asset.

  5. For Type, select Cloud Storage bucket.

  6. In the Display name field , enter Data mesh asset. Dataplex Universal Catalog automatically generates an asset ID for you.

  7. In the Bucket field, click Browse.

    1. Select your bucket from the list.
    2. Click Select.
  8. Click Done and then click Continue.

  9. Click Continue to accept the default Advanced settings.

  10. Click Submit.