Create data products

This document is intended for data product owners who want to create and configure data products in Knowledge Catalog (formerly Dataplex Universal Catalog).

For more information about the architecture and key concepts of data products, see About data products.

Before you begin

Before you create data products, complete the following prerequisites.

Set up Gemini in BigQuery

Setting up Gemini in BigQuery is an optional but highly recommended step before you create your first data product.

By default, creating a data product requires you to manually enter business descriptions, technical definitions, and onboarding documentation for your assets. When you enable Gemini, Knowledge Catalog uses AI assistance to automatically analyze your schemas and data scan results, and auto-generates the following:

  • Business documentation: It create a documentation template and writes clear description of data product and each of its data assets.
  • Insights and sample queries: It constructs ready-to-use sample queries based on the asset's schema layout, allowing data consumers to immediately start querying the product upon approval.

If you choose not to set up Gemini, you can skip this section. However, you will need to manually write all asset metadata and query templates during creation.

For more information, see Set up Gemini in BigQuery.

Enable APIs

Enable the Dataplex, BigQuery APIs.

Roles required to enable APIs

To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.

Enable the APIs

Create data assets

Ensure that your data assets (for example, BigQuery datasets, tables, and views) are created and populated.

For more information about creating data assets, see the following documents:

Configure identities

Identify or create the Google Groups or service accounts that you want to configure in your data product.

Required roles

This section outlines the minimum IAM roles required for the following primary sections:

  • Data product owners: users who create, configure, and manage data products and their associated assets

  • Data product consumers: users who search for, view, and request access to published data products

Required roles for data product owners

To get the permissions that you need to create and manage data products, ask your administrator to grant you the following IAM roles on the project:

  • Full permissions to create, update, delete, manage permissions, and approve or reject access requests for data products: Dataplex Data Products Admin (roles/dataplex.dataProductsAdmin)
  • Update and manage permissions, and approve or reject access requests for data products: Dataplex Data Products Editor (roles/dataplex.dataProductsEditor)
  • Add metadata aspects (such as schema, overview, contacts, and queries): Dataplex Entry and EntryLink Owner (roles/dataplex.entryOwner)
  • Search for and add assets: Dataplex Catalog Viewer (roles/dataplex.catalogViewer)
  • Edit system aspect types (such as overview, contact, contract, and queries): Dataplex Catalog Editor (roles/dataplex.catalogEditor)
  • Create or retrieve insights data scans for automated documentation and insights generation: Dataplex DataScan Admin (roles/dataplex.dataScanAdmin)

For more information about granting roles, see Manage access to projects, folders, and organizations.

These predefined roles contain the permissions required to create and manage data products. To see the exact permissions that are required, expand the Required permissions section:

Required permissions

The following permissions are required to create and manage data products:

  • Create a data product: dataplex.dataProducts.create
  • List data products in a project: dataplex.dataProducts.list
  • Get or view data product: dataplex.dataProducts.get
  • Edit an existing data product: dataplex.dataProducts.update
  • Delete data product: dataplex.dataProducts.delete
  • Approve data product access request: dataplex.dataProducts.approve
  • Search for a data product using Knowledge Catalog:
    • dataplex.dataProducts.get
    • dataplex.projects.search
  • Create data product access request: dataplex.dataProducts.get
  • Create a data asset: dataplex.dataAssets.create
  • List data assets within a data product: dataplex.dataAssets.list
  • Get data asset: dataplex.dataAssets.get
  • Edit an existing data asset: dataplex.dataAssets.update
  • Delete data asset: dataplex.dataAssets.delete
  • Create a data scan: dataplex.datascans.create
  • List all data scans: dataplex.datascans.list
  • Get a data scan: dataplex.datascans.get
  • Run a data scan: dataplex.datascans.run
  • Edit the overview system aspect type: dataplex.entryGroups.useOverviewAspect
  • Edit the refresh cadence system aspect type: dataplex.entryGroups.useRefreshCadenceAspect
  • Edit the queries system aspect type: dataplex.entryGroups.useQueriesAspect

You might also be able to get these permissions with custom roles or other predefined roles.

Required roles for data product consumers

For the data product consumers to search for, view, and request access to data products, as a data product owner, you must ensure the data product is discoverable. To do this, grant the data product consumers the following IAM roles on the data product:

  • Search for data products and request access to them: Dataplex Data Product Consumer (dataplex.dataProductsConsumer) and Dataplex Catalog Viewer (roles/dataplex.catalogViewer)
  • Read-only access to view data product definitions and metadata: Dataplex Data Product Viewer (dataplex.dataProductsViewer)

Create and configure a data product

Creating a data product involves the following high-level tasks:

  1. Create a data product

    This mandatory initial step requires defining core details such as a unique data product name, description, region where the data product is created, and contact details.

  2. Optional: Add assets

    In this phase, you select assets to include in the data product. A key constraint is that assets must reside in the same region as the data product itself. You can add up to 10 assets at a time, with a total maximum of 50 assets allowed per data product.

    For the list of supported assets, see Assets supported.

  3. Optional: Configure access groups and asset permissions

    In this optional phase, you simplify access control by creating access groups. These access groups act as user-friendly aliases (for example, Analyst or Reader) for underlying Google Groups and service accounts. You then assign permissions by selecting a specific IAM role and mapping it to an access group for a specific asset.

  4. Optional: Add contract and aspect details

    In this phase, you enhance governance and data discoverability by attaching metadata frameworks. You can add a contract to formally communicate your data refresh cadence, specifying parameters such as refresh frequency, timing, and variance thresholds. You can also attach custom aspects to provide additional business or technical metadata for your data product.

  5. Optional: Add additional details

    In this final phase, you add rich text documentation, such as user onboarding guides, business definitions, and sample queries, to help consumers interact with the data product immediately upon approval.

To create and configure a data product, complete the steps in the following sections:

Create a data product

Console

  1. In the Google Cloud console, go to the Knowledge Catalog Data products page.

    Go to Data products

  2. Click Create.

  3. In the Create data products pane, enter the following details:

    • Data product name: Enter a unique name for your data product.
    • Data product ID: This is an auto-generated unique identifier. You can edit this field.
    • Project ID: This is a unique identifier of the project where the data product is created. Browse and select the project.
    • Region: Select the region or multi-region where the data product is created.
    • Data product icon: Browse and select an icon to visually identify the data product. This is optional.
    • Description: Enter a brief description of the data product.
    • Contacts: Provide the point of contact information for governance and approval workflows:

      • Data product owner(s) email address: Enter the email address of the data product owners.
      • Data product approver(s) email address: Enter the email address of the designated approvers responsible for signing off on access requests or modifications.
    • Labels: Add key-value labels to organize your resources. This is optional.

  4. Click Create data product.

REST

To create a data product, use the dataProducts.create method.

For example, send the following POST request:

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-d '{"display_name": "DISPLAY_NAME", "owner_emails": ["EMAIL_IDs"], "access_approval_config": { "approver_emails": ["APPROVER_EMAIL_IDs"]} }' \
https://dataplex.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/dataProducts?data_product_id=DATA_PRODUCT_ID

Replace the following:

  • DISPLAY_NAME: a user-friendly name for your data product
  • EMAIL_IDs: comma-separated email addresses of the data product owners
  • APPROVER_EMAIL_IDs: comma-separated email addresses of the designated approvers responsible for signing off on access requests or modifications.
  • PROJECT_ID: the ID of your Google Cloud project
  • LOCATION: the region in which you want to create the data product
  • DATA_PRODUCT_ID: a unique ID for your data product

Terraform

To create a data product, use the google_dataplex_data_product resource.

resource "google_dataplex_data_product" "example_product" {
project         = "PROJECT_ID"
location        = "LOCATION"
data_product_id = "DATA_PRODUCT_ID"
display_name    = "DISPLAY_NAME"
description     = "DESCRIPTION"
owner_emails    = ["EMAIL_IDs"]

provider = google-beta
}

Replace the following:

  • PROJECT_ID: the ID of your Google Cloud project
  • LOCATION: the region in which you want to create the data product
  • DATA_PRODUCT_ID: a unique ID for your data product
  • DISPLAY_NAME: a user-friendly name for your data product
  • DESCRIPTION: a brief description of the data product
  • EMAIL_IDs: comma-separated email addresses of the data product owners, for example—["user1@example.com", "user2@example.com"]

Optional: Add assets

You can add various data assets, such as BigQuery tables, views, datasets, and models to your data product. For the list of supported assets, see Assets supported.

Console

  1. In the Add assets pane, click +Add.

  2. Search for and select the assets that you want to add to your data product. The assets you select must reside in the same region as the data product.

    If you have necessary permissions, you can view the metadata of assets by clicking the asset.

  3. To refine the search results, use Filters.

  4. After you select the assets, click Add.

  5. Click Continue.

REST

To add a data asset to your data product, use the dataAssets.create method.

For example, send the following POST request:

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-d '{"resource": "RESOURCE_NAME"}' \
https://dataplex.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/dataProducts/DATA_PRODUCT_ID/dataAssets?data_asset_id=DATA_ASSET_ID

Replace the following:

  • RESOURCE_NAME: the full resource name of the data asset (for example, //bigquery.googleapis.com/projects/PROJECT_ID/datasets/DATASET_ID/tables/TABLE_ID)
  • PROJECT_ID: the ID of your Google Cloud project
  • LOCATION: the region where the data product exists
  • DATA_PRODUCT_ID: the ID of the data product
  • DATA_ASSET_ID: a unique ID for this data asset within the data product

Terraform

To add a data asset to your data product, use the google_dataplex_data_product_data_asset resource.

resource "google_dataplex_data_product_data_asset" "example_asset" {
project         = "PROJECT_ID"
location        = "LOCATION"
data_product_id = "DATA_PRODUCT_ID"
data_asset_id   = "DATA_ASSET_ID"
resource        = "RESOURCE_NAME"

provider = google-beta
}

Replace the following:

  • PROJECT_ID: the ID of your Google Cloud project
  • LOCATION: the region where the data product exists
  • DATA_PRODUCT_ID: the ID of the data product
  • DATA_ASSET_ID: a unique ID for this data asset within the data product
  • RESOURCE_NAME: the full resource name of the data asset (for example, //bigquery.googleapis.com/projects/PROJECT_ID/datasets/DATASET_ID/tables/TABLE_ID)

Optional: Configure access groups and asset permissions

In the Configure access groups and asset permissions pane, you can create access groups and assign permissions to assets.

Configure access groups

Console

  1. Click Add access group.

  2. In the Access group name field, enter a name for the access group. For example, Analyst.

  3. In the Access group description field, enter a description for the access group.

  4. In the Access group identifier field, enter the email address of a Google Group that you want to assign to this access group

    Data product consumers who request access for themselves are added as members to the mapped Google Group.

    For more information about creating Google Groups, see Create and manage Google Groups in the Google Cloud console.

  5. In the Access group service account field, enter the email address of a service account that you want to assign to this access group.

    Data product consumers who request access for their service accounts are granted the Service Account Token Creator (roles/iam.serviceAccountTokenCreator) IAM role to impersonate the data producer service account mapped to the access group.

    For more information about creating service accounts, see Create service accounts.

  6. Click Done.

  7. To add another access group, click Add access group and repeat the steps.

    You can add a maximum of three access groups per data product.

  8. Click Save.

REST

To configure an access group for the data product, use the dataProducts.patch method.

For example, send the following PATCH request:

curl -X PATCH \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-d '{"access_groups": ACCESS_GROUPS_MAP}' \
https://dataplex.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/dataProducts/DATA_PRODUCT_ID?update_mask="access_groups"

Replace the following:

  • ACCESS_GROUPS_MAP: a JSON object representing a map where each key is an access group ID and the value is an AccessGroup object. For example:

    {
    "analyst": {
      "id": "analyst",
      "display_name": "Analyst access group",
      "description": "Access group for analysts",
      "principal":
        {
          "google_group": "analyst-team@example.com",
          "service_account": "analyst-svc@gserviceaccount.com"
        }
    }
    
  • PROJECT_ID: the ID of your Google Cloud project

  • LOCATION: the region where the data product exists

  • DATA_PRODUCT_ID: the ID of your data product

Terraform

To define access groups for your data product, use the access_groups nested block within the google_dataplex_data_product resource.

For example, use the following configuration:

resource "google_dataplex_data_product" "example_data_product" {
project         = "PROJECT_ID"
location        = "LOCATION"
data_product_id = "DATA_PRODUCT_ID"
display_name    = "DISPLAY_NAME"
owner_emails    = ["EMAIL_IDs"]

access_groups {
  id           = "analyst" # Internal identifier for configuration
  group_id     = "analyst" # Unique identifier of the access group, should be same as the 'id'
  display_name = "Business Analyst"
  description  = "Access group for regional analysts"
  principal {
    google_group = "analyst-team@example.com"
  }

provider = google-beta
}

Replace the following:

  • PROJECT_ID: the ID of your Google Cloud project
  • LOCATION: the region where the data product exists
  • DATA_PRODUCT_ID: a unique ID for the data product
  • DISPLAY_NAME: a user-friendly name for your data product
  • EMAIL_IDs: comma-separated email addresses of the data product owners, for example—["user1@example.com", "user2@example.com"]

Configure asset permissions

After you configure access groups, you can configure permissions for the assets in the data product.

Console

  1. In the Asset permissions section, select the asset for which you want to configure permissions. You can select and configure permissions for up to 10 assets at a time.

  2. Click Configure permissions.

  3. In the Select access group field, select an access group.

  4. In the Assign IAM role field, select an IAM role that you want to assign to the access group.

    For example, if your asset is a BigQuery table named Sales, and if you selected Analyst access group, and assigned BigQuery Metadata Viewer role to this access group, the data product consumers who are part of the Analyst access group have BigQuery Metadata Viewer permission on the Sales table.

    You can add multiple roles to an asset.

  5. Click Configure. The asset now shows its assigned permissions.

  6. To configure permissions for other assets, repeat the steps.

  7. Click Continue.

REST

To configure permissions for the assets in the data product, use the dataAssets.patch method.

For example, send the following PATCH request:

curl -X PATCH \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-d '{"access_group_configs": ACCESS_GROUP_CONFIGS_MAP}' \
https://dataplex.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/dataProducts/DATA_PRODUCT_ID/dataAssets/DATA_ASSET_ID?update_mask="access_group_configs"

Replace the following:

  • ACCESS_GROUP_CONFIGS_MAP: a JSON object representing a map where each key is an access group ID and the value is an AccessGroupConfig object. For example:

    {
    "analyst": {
      iam_roles: ["roles/bigquery.dataViewer"]
      }
    }
    
  • PROJECT_ID: the ID of your Google Cloud project

  • LOCATION: the region where the data product exists

  • DATA_PRODUCT_ID: the ID of your data product

  • DATA_ASSET_ID: the ID of the asset for which you want to configure permissions

Terraform

Assign IAM roles to your access groups for specific assets using the access_group_configs block in the google_dataplex_data_product_data_asset resource.

For example, use the following configuration:

resource "google_dataplex_data_product_data_asset" "example_data_asset" {
project         = "PROJECT_ID"
location        = "LOCATION"
data_product_id = "DATA_PRODUCT_ID"
data_asset_id   = "DATA_ASSET_ID"
resource        = "RESOURCE_NAME"

access_group_configs {
  access_group = "analyst" # Must match the 'id' defined in google_dataplex_data_product
  iam_roles    = ["roles/bigquery.dataViewer"]
}

provider = google-beta
}

Replace the following:

  • PROJECT_ID: the ID of your Google Cloud project
  • LOCATION: the region where the data product exists
  • DATA_PRODUCT_ID: the ID of the data product
  • DATA_ASSET_ID: a unique ID for this data asset within the data product
  • RESOURCE_NAME: the full resource name of the data asset (for example, //bigquery.googleapis.com/projects/PROJECT_ID/datasets/DATASET_ID/tables/TABLE_ID)

Optional: Add contract and aspect details

You can add contracts and aspects for a data product.

Add a contract

To establish a foundation of trust between data producers and consumers, you can attach a contract to your data product. By specifying parameters such as refresh time and thresholds, you provide consumers with the necessary context to understand when the data is updated and whether it meets their specific business requirements.

Console

  1. In the Add contract and aspect details pane, click Add contract.

  2. In the Select contract field, select Refresh cadence.

  3. In the Frequency field, select an agreed-upon schedule for how often data is updated or delivered, ensuring a predictable flow from data producer to data consumer. For example, Weekly.

  4. In the Refresh time field, enter a maximum acceptable time when data is updated at its source and when it becomes available to the consumer. For example, 23:00 PST.

  5. In the Threshold (in minutes) field, enter a measurable limit in minutes for the acceptable delay in data delivery. For example, enter 30 to set a threshold of 30 minutes.

  6. Optional: In the Cron schedule field, enter a cron expression that defines the schedule for data generation and delivery in the format: MINUTE HOUR DAY_OF_MONTH MONTH DAY_OF_WEEK

    The following are the accepted values:

    • MINUTE: 0-59
    • HOUR: 0-23
    • DAY_OF_MONTH: 1-31
    • MONTH: 1-31 or JAN-DEC
    • DAY_OF_WEEK: 0-6 or SUN-SAT

    For example, 0 8 * * 1-5 runs at 8:00 AM on weekdays (Monday-Friday).

  7. Click Save.

REST

Contracts are modeled as aspects on the data product. To add a Refresh Cadence contract for a data product, use the entries.patch method.

For example, send the following PATCH request:

curl -X PATCH \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-d \
'{
  "aspects": {
    "dataplex-types.global.refresh-cadence": {
      "aspectType": "projects/dataplex-types/locations/global/aspectTypes/refresh-cadence",
      "data": {
        "frequency": "REFRESH_FREQUENCY"
      }
    }
  }
}' \
"https://dataplex.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/entryGroups/@dataplex/entries/projects/DATA_PRODUCT_PROJECT_NUMBER/locations/DATA_PRODUCT_LOCATION/dataProducts/DATA_PRODUCT_ID?updateMask=aspects"

Replace the following:

  • REFRESH_FREQUENCY: the agreed-upon schedule for how often data is updated or delivered, ensuring a predictable flow from data producer to data consumer. For example: Weekly
  • PROJECT_ID: the ID of your Google Cloud project where the API call is being made
  • LOCATION: the region of the Knowledge Catalog service endpoint you are calling (for example, us-central1)
  • DATA_PRODUCT_PROJECT_NUMBER: the project number where the data product resource is located
  • DATA_PRODUCT_LOCATION: the location of the data product resource
  • DATA_PRODUCT_ID: the ID of your data product

Terraform

Contracts are modeled as aspects on the data product. To manage a contract, you must manage the underlying Knowledge Catalog entry. Because Terraform doesn't automatically discover existing aspects, you must first import the google_dataplex_entry.

To import the entry, use the following command:

terraform import google_dataplex_entry.data_product_metadata "projects/DATA_PRODUCT_PROJECT_NUMBER/locations/LOCATION/entryGroups/@dataplex/entries/projects/DATA_PRODUCT_PROJECT_NUMBER/locations/LOCATION/dataProducts/DATA_PRODUCT_ID"

Terraform configuration:

resource "google_dataplex_entry" "data_product_metadata" {
project        = "DATA_PRODUCT_PROJECT_NUMBER"
location       = "LOCATION"
entry_group_id = "@dataplex"
entry_id       = "projects/DATA_PRODUCT_PROJECT_NUMBER/locations/LOCATION/dataProducts/DATA_PRODUCT_ID"
entry_type     = "projects/655216118709/locations/global/entryTypes/data-product"

aspects {
  aspect_key = "655216118709.global.refresh-cadence"
  aspect {
    data = jsonencode({
      frequency = "REFRESH_FREQUENCY"
    })
  }
}

provider = google-beta
}

Replace the following:

  • DATA_PRODUCT_PROJECT_NUMBER: the project number where the data product resource is located
  • LOCATION: the region of the Knowledge Catalog service endpoint you are calling (for example, us-central1)
  • DATA_PRODUCT_ID: the ID of your data product
  • REFRESH_FREQUENCY: the agreed-upon schedule for how often data is updated or delivered, ensuring a predictable flow from data producer to data consumer. For example: Weekly

For general information on the import process, refer to the Terraform import documentation.

Add aspects

Use aspects to enrich your data product with structured, reusable metadata. These templates provide a standardized way for data producers to communicate the quality and fitness of a data product, improving governance and helping consumers determine if the product meets their business needs.

To add aspects for the data product, follow these steps:

Console

  1. In the Add contract and aspect details pane, click + Add aspect.

  2. In the Select aspect type field, search for and select an aspect type from the list. For example, Geo context.

  3. Click Save.

REST

To add aspects for a data product, use the entries.patch method.

For example, send the following PATCH request:

curl -X PATCH \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-d \
'{
  "aspects": {
    "ASPECT_PROJECT_ID.ASPECT_LOCATION.ASPECT_NAME": {
      "aspectType": "projects/ASPECT_PROJECT_ID/locations/ASPECT_LOCATION/aspectTypes/ASPECT_NAME",
      "data": {}
    }
  }
}' \
"https://dataplex.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/entryGroups/@dataplex/entries/projects/DATA_PRODUCT_PROJECT_NUMBER/locations/DATA_PRODUCT_LOCATION/dataProducts/DATA_PRODUCT_ID?updateMask=aspects"

Replace the following:

  • ASPECT_PROJECT_ID: the ID of your Google Cloud project where the aspect is created
  • ASPECT_LOCATION: the region of the Knowledge Catalog service endpoint where the aspect is created (for example, us-central1)
  • ASPECT_NAME: the name of the aspect you want to attach to the data product
  • PROJECT_ID: the ID of your Google Cloud project where the API call is being made
  • LOCATION: the region of the Knowledge Catalog service endpoint you are calling (for example, us-central1)
  • DATA_PRODUCT_PROJECT_NUMBER: the project number where the data product resource is located
  • DATA_PRODUCT_LOCATION: the location of the data product resource
  • DATA_PRODUCT_ID: the ID of your data product

Terraform

To manage aspects, you must manage the underlying Knowledge Catalog entry. Because Terraform doesn't automatically discover existing aspects, you must first import the google_dataplex_entry.

To import the entry, use the following command:

terraform import google_dataplex_entry.data_product_metadata "projects/DATA_PRODUCT_PROJECT_NUMBER/locations/LOCATION/entryGroups/@dataplex/entries/projects/DATA_PRODUCT_PROJECT_NUMBER/locations/LOCATION/dataProducts/DATA_PRODUCT_ID"

Terraform configuration:

resource "google_dataplex_entry" "data_product_metadata" {
project        = "DATA_PRODUCT_PROJECT_NUMBER"
location       = "LOCATION"
entry_group_id = "@dataplex"
entry_id       = "projects/DATA_PRODUCT_PROJECT_NUMBER/locations/LOCATION/dataProducts/DATA_PRODUCT_ID"
entry_type     = "projects/655216118709/locations/global/entryTypes/data-product"

aspects {
  aspect_key = "ASPECT_PROJECT_NUMBER.ASPECT_LOCATION.ASPECT_NAME"
  aspect {
    data = {}
  }
}

provider = google-beta
}

Replace the following:

  • DATA_PRODUCT_PROJECT_NUMBER: the project number where the data product resource is located
  • LOCATION: the region of the Knowledge Catalog service endpoint you are calling (for example, us-central1)
  • DATA_PRODUCT_ID: the ID of your data product
  • ASPECT_PROJECT_NUMBER: the Google Cloud project number where the aspect is created
  • ASPECT_LOCATION: the region of the Knowledge Catalog service endpoint where the aspect is created (for example, us-central1)
  • ASPECT_NAME: the name of the aspect you want to attach to the data product

For general information on the import process, refer to the Terraform import documentation.

Optional: Add additional details

You can add documentation and sample queries for your data product to provide essential context, business logic descriptions, and user guides. In Knowledge Catalog, documentation is managed through the overview system aspect.

You can manually create this documentation or use Knowledge Catalog data insights to automatically generate it.

Manually add documentation and sample queries

Console

To add documentation for your data product, follow these steps:

  1. In the Add additional details pane, click Edit next to Documentation.

  2. Type in the content in the rich-text editor.

  3. Click Save.

To add sample queries for your data product, follow these steps:

  1. In the Add additional details pane, click Add queries in the Query recommendation section.

  2. Type the sample queries.

  3. Click Save.

The newly created data product appears on the Knowledge Catalog Data products page.

REST

Documentation is modeled as aspects on the data product. To add documentation, use the entries.patch method.

For example, send the following PATCH request:

curl -X PATCH \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-d \
'{
  "aspects": {
    "dataplex-types.global.overview": {
      "aspectType": "projects/dataplex-types/locations/global/aspectTypes/overview",
      "data": {
        "content": "DOCUMENTATION"
      }
    }
  }
}' \
"https://dataplex.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/entryGroups/@dataplex/entries/projects/DATA_PRODUCT_PROJECT_NUMBER/locations/DATA_PRODUCT_LOCATION/dataProducts/DATA_PRODUCT_ID?updateMask=aspects"

Replace the following:

  • PROJECT_ID: the ID of your Google Cloud project where the API call is being made
  • LOCATION: the region of the Knowledge Catalog service endpoint you are calling (for example, us-central1)
  • DATA_PRODUCT_PROJECT_NUMBER: the project number where the data product resource is located
  • DATA_PRODUCT_LOCATION: the location of the data product resource
  • DATA_PRODUCT_ID: the ID of your data product
  • DOCUMENTATION: the content that you want to attach to the data product

Terraform

Documentation is modeled as aspects on the data product. To manage documentation, you must manage the underlying Knowledge Catalog entry. Because Terraform doesn't automatically discover existing aspects, you must first import the google_dataplex_entry.

To import the entry, use the following command:

terraform import google_dataplex_entry.data_product_metadata "projects/DATA_PRODUCT_PROJECT_NUMBER/locations/LOCATION/entryGroups/@dataplex/entries/projects/DATA_PRODUCT_PROJECT_NUMBER/locations/LOCATION/dataProducts/DATA_PRODUCT_ID"

Terraform configuration:

resource "google_dataplex_entry" "data_product_metadata" {
project        = "DATA_PRODUCT_PROJECT_NUMBER"
location       = "LOCATION"
entry_group_id = "@dataplex"
entry_id       = "projects/DATA_PRODUCT_PROJECT_NUMBER/locations/LOCATION/dataProducts/DATA_PRODUCT_ID"
entry_type     = "projects/655216118709/locations/global/entryTypes/data-product"

aspects {
  aspect_key = "655216118709.global.overview"
  aspect {
    data = jsonencode({
      content = "DOCUMENTATION"
    })
  }
}

provider = google-beta
}

Replace the following:

  • DATA_PRODUCT_PROJECT_NUMBER: the project number where the data product resource is located
  • LOCATION: the region of the Knowledge Catalog service endpoint you are calling (for example, us-central1)
  • DATA_PRODUCT_ID: the ID of your data product
  • DOCUMENTATION: the content that you want to attach to the data product

For general information on the import process, refer to the Terraform import documentation.

Generate automated documentation and sample queries using data insights

Before you generate documentation and sample queries using Gemini, complete the following prerequisites:

  1. Enable the Gemini for Google Cloud API in the project where you create the data product.

  2. Grant insight-specific user roles: Ask your administrator to grant your identity the following roles and permissions on the data product project:

    • Generate and manage data insights: Dataplex DataScan Editor (roles/dataplex.dataScanEditor) or Dataplex DataScan Administrator (roles/dataplex.dataScanAdmin) on the project where data product resides
    • View generated insights: Dataplex DataScan DataViewer (roles/dataplex.dataScanDataViewer) on the project where data product resides
  3. Configure cross-project service agent permissions. If your underlying data assets reside in a Google Cloud project different from your data product project, you must grant the Knowledge Catalog service agent (P4SA) access to those assets:

    1. To generate or retrieve the service agent identifier for your data product project, run the following Google Cloud CLI command:

      gcloud beta services identity create --service=dataplex.googleapis.com --project=DATA_PRODUCT_PROJECT_ID
      

      Replace DATA_PRODUCT_PROJECT_ID with the Google Cloud project ID where your data product resides.

    2. In each external project where your assets reside, grant the data product project's service agent the following roles:

      • BigQuery Data Editor (roles/bigquery.dataEditor) on the underlying tables and datasets

      • BigQuery Studio Admin (roles/bigquery.studioAdmin) on the asset project

To generate documentation and sample queries for your data product using data insights, complete the following steps:

Console

  1. In the Add additional details pane, on the Generate insights with Gemini bar, and click Generate.

    Wait for a few minutes for the insight generation process to complete.

  2. To review the generated content, click View.

  3. Evaluate the generated content:

    • If the content is accurate, click Save. This populates the rich-text editor with a predefined documentation template and adds sample queries to the Insights section.

    • If the content doesn't meet expectations, click Discard.

  4. Click Save to finalize.

REST

To automatically generate, retrieve, and apply documentation and insights using the API, execute the following series of Knowledge Catalog DataScans API calls.

  1. Generate automated documentation.

    To trigger the automated documentation generation, create a DATA_DOCUMENTATION type data scan by sending a POST request to the dataScans endpoint:

    curl -X POST \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "Content-Type: application/json" \
    -d '{
      "data": {
        "resource": "DATA_PRODUCT_RESOURCE_NAME"
      },
      "executionSpec": {
        "trigger": {
          "oneTime": {
            "ttl_after_scan_completion": "TTL"
          }
        }
      },
      "type": "DATA_DOCUMENTATION",
      "dataDocumentationSpec": {}
    }' \
    "https://dataplex.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/dataScans?data_scan_id=DATA_SCAN_ID"
    

    Replace the following:

    • DATA_PRODUCT_RESOURCE_NAME: the full resource name of the target data product to scan.
    • TTL: the duration in seconds after which the scan resource should be automatically deleted (for example, 3600 for one hour). If not specified, the default value is 24 hours. The maximum allowed value is 365 days (31536000 seconds).
    • PROJECT_ID: the ID of your Google Cloud project
    • LOCATION: the region where the data scan runs
    • DATA_SCAN_ID: a unique ID you provide for this scan
  2. Retrieve the generated documentation.

    After the data scan job completes, retrieve the generated documentation and query insights by sending a GET request with the view=full parameter:

    curl -X GET \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "Content-Type: application/json" \
    "https://dataplex.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/dataScans/DATA_SCAN_ID?view=full"
    
  3. Save the generated queries to the data product.

    Extract the generated SQL snippets from the data scan output in the previous step, and attach them to your data product entry by updating its queries aspect through a PATCH request:

    curl -X PATCH \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "Content-Type: application/json" \
    -d '{
      "aspects": {
        "dataplex-types.global.queries": {
          "aspectType": "projects/dataplex-types/locations/global/aspectTypes/queries",
          "data": {
            "queries": [
              {
                "description": "QUERY_DESCRIPTION",
                "sql": "SQL_STATEMENT",
                "source": "USER"
              }
            ]
          }
        }
      }
    }' \
    "https://dataplex.googleapis.com/v1/projects/CATALOG_PROJECT_ID/locations/CATALOG_LOCATION/entryGroups/@dataplex/entries/projects/DATA_PRODUCT_PROJECT_NUMBER/locations/DATA_PRODUCT_LOCATION/dataProducts/DATA_PRODUCT_ID?updateMask=aspects"
    

    Replace the following:

    • QUERY_DESCRIPTION: a description explaining what the recommended sample query accomplishes

    • SQL_STATEMENT: the literal text of the generated SQL sample query

    • CATALOG_PROJECT_ID: the ID of the Google Cloud project where you are making the API call

    • CATALOG_LOCATION: the regional endpoint for the Knowledge Catalog service (for example, us-central1)

    • DATA_PRODUCT_PROJECT_NUMBER: the project number where the data product resource is hosted

    • DATA_PRODUCT_LOCATION: the location of your data product resource

    • DATA_PRODUCT_ID: the ID of your data product

What's next