Build foundational data governance

You've probably asked questions like "What does this column name mean?", "Who owns this broken dataset?", or "Is this table approved for use?" Some data catalogs use unstructured tags to add this information, but tags quickly become outdated or inconsistent. Knowledge Catalog (formerly Dataplex Universal Catalog) avoids this issue by letting you attach structured, schema-driven metadata and clear business definitions directly to your data assets. This approach helps you build programmatic governance at scale.

This tutorial shows you how to get started with data governance in Knowledge Catalog. Designed for data engineers, database administrators, and data architects, this tutorial walks through manual UI steps to help you build a strong mental model before you automate these workflows. It clarifies the relationships between key Knowledge Catalog concepts. By the end, you'll know how to make your data discoverable and trustworthy.

Objectives

In this tutorial, you learn how to:

  • Create a single source of truth for your business terms with a business glossary.
  • Structure and organize your metadata with aspect types.
  • Attach metadata to your assets with aspects.
  • Use Knowledge Catalog Search to find exactly what you need using this new structured metadata.

Before you begin

Before you begin, do the following:

Set up your environment

This tutorial uses Cloud Shell, a command-line environment that runs in the cloud.

  1. From the Google Cloud console, click Activate Cloud Shell in the top right toolbar. It takes a few moments to provision and connect to the environment.

  2. In Cloud Shell, set your PROJECT_ID and LOCATION variables so that all future commands target your specific Google Cloud project.

    export PROJECT_ID=$(gcloud config get-value project)
    gcloud config set project $PROJECT_ID
    export LOCATION="us-central1"
    
  3. Enable the necessary Google Cloud services.

    gcloud services enable \
      dataplex.googleapis.com \
      bigquery.googleapis.com \
      datacatalog.googleapis.com
    

Create a BigQuery dataset and prepare sample data

Use the following code to create a BigQuery dataset and load some sample CSV transactions into a table. After you create the table, Knowledge Catalog automatically discovers it and creates an entry for it in the catalog.

Think of an entry as Knowledge Catalog's representation of a data asset. It's like a record in the catalog that you can attach governance metadata to. Instead of governing the BigQuery table directly, you govern its entry in Knowledge Catalog.

# Create the BigQuery Dataset in the us-central1 region
bq --location=$LOCATION mk --dataset \
    --description "Retail data for governance codelab" \
    $PROJECT_ID:retail_data

# Create a temporary CSV file with the sample data
echo "transaction_id,user_email,gmv,transaction_date
1001,test@example.com,150.50,2025-08-28
1002,user@example.com,75.00,2025-08-28" > /tmp/transactions.csv

# Load the data from the temporary CSV file into a BigQuery table
bq load \
    --source_format=CSV \
    --autodetect \
    retail_data.transactions \
    /tmp/transactions.csv

# (Optional) Clean up the temporary file
rm /tmp/transactions.csv

Run a SELECT query to verify your setup:

bq query --nouse_legacy_sql "SELECT * FROM retail_data.transactions"

Example output:

+----------------+------------------+-------+------------------+
| transaction_id |    user_email    |  gmv  | transaction_date |
+----------------+------------------+-------+------------------+
|           1001 | test@example.com | 150.5 |       2025-08-28 |
|           1002 | user@example.com |  75.0 |       2025-08-28 |
+----------------+------------------+-------+------------------+

Establish common terms with a business glossary

Good governance relies on clear definitions. For example, a developer shouldn't have to guess if a column named gmv means Gross Merchandise Value or whether it includes taxes or returns. A business glossary solves this by creating a single source of truth that decouples business definitions from technical details. This ensures that terms like Gross Merchandise Value mean the same thing to everyone, from the Sales team to Finance.

Follow these steps to create a glossary and define your first term:

  1. In the Google Cloud console, go to the Knowledge Catalog Glossaries page.

    Go to Glossaries

  2. Click Create Business Glossary.

  3. Enter the following details:

    • Display name: Retail Business Glossary
    • Location: us-central1 (Iowa)
  4. Click Create.

  5. Click Create Category.

  6. Name the category Sales Metrics, and click Create.

  7. Select the Sales Metrics category and click Add term.

  8. Name the term Gross Merchandise Value and click Create.

  9. Click the Gross Merchandise Value term to open its details page.

  10. Click Add next to Overview. Enter the following details: The total value of merchandise sold over a given period of time before the deduction of any fees or expenses. This is a key indicator of e-commerce business growth.

  11. Click Save.

You have now created a glossary term that you can link to data assets across your organization.

Define technical metadata with an aspect type

If you need to track who owns a particular data asset, key-value tags aren't enough. You don't want one table tagged owner:bob and another contact:alice@example.com. You want a structured schema that requires owner information to be in a valid email format.

To meet this need, Knowledge Catalog supports aspect types. An aspect type is like a blueprint for your metadata that lets you set clear rules and required fields. This ensures that any metadata you add later stays organized.

  1. In the Google Cloud console, go to the Knowledge Catalog Aspect types tab on the Metadata types page.

    Go to Aspect types

  2. On the Custom tab, click Create.

  3. Enter the following details:

    • Display name: Data Asset Governance
    • Location: us-central1 (Iowa)
  4. In the Template section, click Add Field to create the following three fields:

    • Field 1:

      • Display name: Data Steward
      • Type: Text
      • Is Required: Select the checkbox.
      • Text type: Plain text
    • Field 2 (click Add field):

      • Display name: Data Sensitivity
      • Type: Enum
      • Is Required: Leave optional.
      • Values: Add Public, Internal, and Confidential
    • Field 3 (click Add a field):

      • Display name: Last Review Date
      • Is Required: Leave optional.
      • Type: Date and time
  5. Click Save.

You now have an aspect type for governance-related metadata fields like data steward, sensitivity level, and review date. In the next section, you apply this schema to a table entry by attaching an aspect with specific values for these fields.

Enrich an entry with governance metadata

Column names are often abbreviated or ambiguous. Linking a column to a term in your business glossary provides a clear and consistent definition. In this step, you enrich the entry for the retail_data.transactions table by linking the Gross Merchandise Value term to a column named gmv and using your aspect type to attach an aspect to the table entry.

To clarify what the gmv column in retail_data.transactions is, link it to your Gross Merchandise Value term.

  1. In the Google Cloud console, go to the Knowledge Catalog Search page.

    Go to Search

  2. Click Filters to open the Filters panel.

  3. For Scope, select Current Project.

  4. Search for retail_data.transactions and click the returned transactions table.

  5. Click the Schema tab.

  6. Select the checkbox next to the gmv column, and click Add business term.

  7. Select Gross Merchandise Value.

Attach an aspect to the table entry

In addition to linking business terms to columns, you can attach an aspect to a table entry to capture table-level governance metadata, such as data ownership and sensitivity.

An aspect is an instance of an aspect type, containing specific values for metadata fields. When you attach an aspect to an entry, Knowledge Catalog checks the information you provide against the schema defined in the aspect type to ensure consistency.

To define ownership and sensitivity for the retail_data.transactions table, attach the Data Asset Governance aspect:

  1. On the Details tab of the retail_data.transactions entry page, click Add next to Optional aspects.
  2. Select Data Asset Governance from the list.
  3. Enter values in the fields:

    • Data Steward: finance-team@example.com
    • Data Sensitivity: Select Internal.
    • Last Review Date: Select today's date.
  4. Click Save.

You've now set up a solid foundation for data governance in Knowledge Catalog.

Search for entries using enriched metadata

You've enriched the retail_data.transactions entry by linking a column to a business term and attaching an aspect. Now you can use Knowledge Catalog Search to find entries based on these business contexts. For example, you can find all assets with a specific sensitivity level, or search for your glossary term to discover the underlying tables.

  1. In the Google Cloud console, go to the Knowledge Catalog Search page.

    Go to Search

  2. Click Filters to open the Filters panel.

  3. For Scope, select Current Project.

  4. In the search bar, enter Find tables where the Data Asset Governance aspect has Internal sensitivity.

  5. You should see your retail_data.transactions table in the list of results.

  6. Clear the search bar and enter Find tables with the Gross Merchandise Value term attached.

  7. You should again see the retail_data.transactions table in the results, as its gmv column is directly linked to this business term.

Clean up

To avoid incurring charges, delete the resources that you created in this tutorial.

Delete the sample dataset

To delete the sample BigQuery dataset and all its tables, use the following command. This action is irreversible.

# Re-run these exports if your Cloud Shell session timed out
export PROJECT_ID=$(gcloud config get-value project)

# Manually type this command to confirm you are deleting the correct dataset
bq rm -r -f --dataset $PROJECT_ID:retail_data

Delete Knowledge Catalog artifacts

  1. In the Google Cloud console, go to the Knowledge Catalog Aspect types tab on the Metadata types page.

    Go to Aspect types

  2. Select the data_asset_governance aspect type and click Delete.

  3. In the Google Cloud console, go to the Knowledge Catalog Glossaries page.

    Go to Glossaries

  4. Select the Gross Merchandise Value term and click Delete.

  5. Select the Sales Metrics category and click Delete.

  6. Select the Retail Business Glossary and click Delete.

What's next