This document is intended for data product owners who want to create and configure data products in Knowledge Catalog (formerly Dataplex Universal Catalog).
For more information about the architecture and key concepts of data products, see About data products.
Before you begin
Before you create data products, complete the following prerequisites.
Set up Gemini in BigQuery
Setting up Gemini in BigQuery is an optional but highly recommended step before you create your first data product.
By default, creating a data product requires you to manually enter business descriptions, technical definitions, and onboarding documentation for your assets. When you enable Gemini, Knowledge Catalog uses AI assistance to automatically analyze your schemas and data scan results, and auto-generates the following:
- Business documentation: It create a documentation template and writes clear description of data product and each of its data assets.
- Insights and sample queries: It constructs ready-to-use sample queries based on the asset's schema layout, allowing data consumers to immediately start querying the product upon approval.
If you choose not to set up Gemini, you can skip this section. However, you will need to manually write all asset metadata and query templates during creation.
For more information, see Set up Gemini in BigQuery.
Enable APIs
Enable the Dataplex, BigQuery APIs.
Roles required to enable APIs
To enable APIs, you need the Service Usage Admin IAM
role (roles/serviceusage.serviceUsageAdmin), which
contains the serviceusage.services.enable permission. Learn how to grant
roles.
Create data assets
Ensure that your data assets (for example, BigQuery datasets, tables, and views) are created and populated.
For more information about creating data assets, see the following documents:
- Create BigQuery datasets
- Create and use BigQuery tables
- Create logical views in BigQuery
- Create materialized views in BigQuery
Configure identities
Identify or create the Google Groups or service accounts that you want to configure in your data product.
Required roles
This section outlines the minimum IAM roles required for the following primary sections:
Data product owners: users who create, configure, and manage data products and their associated assets
Data product consumers: users who search for, view, and request access to published data products
Required roles for data product owners
To get the permissions that you need to create and manage data products, ask your administrator to grant you the following IAM roles on the project:
-
Full permissions to create, update, delete, manage permissions, and approve or reject access requests for data products:
Dataplex Data Products Admin (
roles/dataplex.dataProductsAdmin) -
Update and manage permissions, and approve or reject access requests for data products:
Dataplex Data Products Editor (
roles/dataplex.dataProductsEditor) -
Add metadata aspects (such as
schema,overview,contacts, andqueries): Dataplex Entry and EntryLink Owner (roles/dataplex.entryOwner) -
Search for and add assets:
Dataplex Catalog Viewer (
roles/dataplex.catalogViewer) -
Edit system aspect types (such as
overview,contact,contract, andqueries): Dataplex Catalog Editor (roles/dataplex.catalogEditor) -
Create or retrieve insights data scans for automated documentation and insights generation:
Dataplex DataScan Admin (
roles/dataplex.dataScanAdmin)
For more information about granting roles, see Manage access to projects, folders, and organizations.
These predefined roles contain the permissions required to create and manage data products. To see the exact permissions that are required, expand the Required permissions section:
Required permissions
The following permissions are required to create and manage data products:
-
Create a data product:
dataplex.dataProducts.create -
List data products in a project:
dataplex.dataProducts.list -
Get or view data product:
dataplex.dataProducts.get -
Edit an existing data product:
dataplex.dataProducts.update -
Delete data product:
dataplex.dataProducts.delete -
Approve data product access request:
dataplex.dataProducts.approve -
Search for a data product using Knowledge Catalog:
-
dataplex.dataProducts.get -
dataplex.projects.search
-
-
Create data product access request:
dataplex.dataProducts.get -
Create a data asset:
dataplex.dataAssets.create -
List data assets within a data product:
dataplex.dataAssets.list -
Get data asset:
dataplex.dataAssets.get -
Edit an existing data asset:
dataplex.dataAssets.update -
Delete data asset:
dataplex.dataAssets.delete -
Create a data scan:
dataplex.datascans.create -
List all data scans:
dataplex.datascans.list -
Get a data scan:
dataplex.datascans.get -
Run a data scan:
dataplex.datascans.run -
Edit the
overviewsystem aspect type:dataplex.entryGroups.useOverviewAspect -
Edit the
refresh cadencesystem aspect type:dataplex.entryGroups.useRefreshCadenceAspect -
Edit the
queriessystem aspect type:dataplex.entryGroups.useQueriesAspect
You might also be able to get these permissions with custom roles or other predefined roles.
Required roles for data product consumers
For the data product consumers to search for, view, and request access to data products, as a data product owner, you must ensure the data product is discoverable. To do this, grant the data product consumers the following IAM roles on the data product:
- Search for data products and request access to them:
Dataplex Data Product Consumer (
dataplex.dataProductsConsumer) and Dataplex Catalog Viewer (roles/dataplex.catalogViewer) - Read-only access to view data product definitions and metadata:
Dataplex Data Product Viewer (
dataplex.dataProductsViewer)
Create and configure a data product
Creating a data product involves the following high-level tasks:
Create a data product
This mandatory initial step requires defining core details such as a unique data product name, description, region where the data product is created, and contact details.
Optional: Add assets
In this phase, you select assets to include in the data product. A key constraint is that assets must reside in the same region as the data product itself. You can add up to 10 assets at a time, with a total maximum of 50 assets allowed per data product.
For the list of supported assets, see Assets supported.
Optional: Configure access groups and asset permissions
In this optional phase, you simplify access control by creating access groups. These access groups act as user-friendly aliases (for example,
AnalystorReader) for underlying Google Groups and service accounts. You then assign permissions by selecting a specific IAM role and mapping it to an access group for a specific asset.Optional: Add contract and aspect details
In this phase, you enhance governance and data discoverability by attaching metadata frameworks. You can add a contract to formally communicate your data refresh cadence, specifying parameters such as refresh frequency, timing, and variance thresholds. You can also attach custom aspects to provide additional business or technical metadata for your data product.
Optional: Add additional details
In this final phase, you add rich text documentation, such as user onboarding guides, business definitions, and sample queries, to help consumers interact with the data product immediately upon approval.
To create and configure a data product, complete the steps in the following sections:
Create a data product
Console
In the Google Cloud console, go to the Knowledge Catalog Data products page.
Click Create.
In the Create data products pane, enter the following details:
- Data product name: Enter a unique name for your data product.
- Data product ID: This is an auto-generated unique identifier. You can edit this field.
- Project ID: This is a unique identifier of the project where the data product is created. Browse and select the project.
- Region: Select the region or multi-region where the data product is created.
- Data product icon: Browse and select an icon to visually identify the data product. This is optional.
- Description: Enter a brief description of the data product.
Contacts: Provide the point of contact information for governance and approval workflows:
- Data product owner(s) email address: Enter the email address of the data product owners.
- Data product approver(s) email address: Enter the email address of the designated approvers responsible for signing off on access requests or modifications.
Labels: Add key-value labels to organize your resources. This is optional.
Click Create data product.
REST
To create a data product, use the
dataProducts.create
method.
For example, send the following POST request:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-d '{"display_name": "DISPLAY_NAME", "owner_emails": ["EMAIL_IDs"], "access_approval_config": { "approver_emails": ["APPROVER_EMAIL_IDs"]} }' \
https://dataplex.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/dataProducts?data_product_id=DATA_PRODUCT_ID
Replace the following:
- DISPLAY_NAME: a user-friendly name for your data product
- EMAIL_IDs: comma-separated email addresses of the data product owners
- APPROVER_EMAIL_IDs: comma-separated email addresses of the designated approvers responsible for signing off on access requests or modifications.
- PROJECT_ID: the ID of your Google Cloud project
- LOCATION: the region in which you want to create the data product
- DATA_PRODUCT_ID: a unique ID for your data product
Terraform
To create a data product, use the
google_dataplex_data_product
resource.
resource "google_dataplex_data_product" "example_product" {
project = "PROJECT_ID"
location = "LOCATION"
data_product_id = "DATA_PRODUCT_ID"
display_name = "DISPLAY_NAME"
description = "DESCRIPTION"
owner_emails = ["EMAIL_IDs"]
provider = google-beta
}
Replace the following:
- PROJECT_ID: the ID of your Google Cloud project
- LOCATION: the region in which you want to create the data product
- DATA_PRODUCT_ID: a unique ID for your data product
- DISPLAY_NAME: a user-friendly name for your data product
- DESCRIPTION: a brief description of the data product
- EMAIL_IDs: comma-separated email addresses of
the data product owners, for
example—
["user1@example.com", "user2@example.com"]
Optional: Add assets
You can add various data assets, such as BigQuery tables, views, datasets, and models to your data product. For the list of supported assets, see Assets supported.
Console
In the Add assets pane, click +Add.
Search for and select the assets that you want to add to your data product. The assets you select must reside in the same region as the data product.
If you have necessary permissions, you can view the metadata of assets by clicking the asset.
To refine the search results, use Filters.
After you select the assets, click Add.
Click Continue.
REST
To add a data asset to your data product, use the
dataAssets.create
method.
For example, send the following POST request:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-d '{"resource": "RESOURCE_NAME"}' \
https://dataplex.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/dataProducts/DATA_PRODUCT_ID/dataAssets?data_asset_id=DATA_ASSET_ID
Replace the following:
- RESOURCE_NAME: the
full resource name of the data asset (for
example,
//bigquery.googleapis.com/projects/PROJECT_ID/datasets/DATASET_ID/tables/TABLE_ID) - PROJECT_ID: the ID of your Google Cloud project
- LOCATION: the region where the data product exists
- DATA_PRODUCT_ID: the ID of the data product
- DATA_ASSET_ID: a unique ID for this data asset within the data product
Terraform
To add a data asset to your data product, use the
google_dataplex_data_product_data_asset
resource.
resource "google_dataplex_data_product_data_asset" "example_asset" {
project = "PROJECT_ID"
location = "LOCATION"
data_product_id = "DATA_PRODUCT_ID"
data_asset_id = "DATA_ASSET_ID"
resource = "RESOURCE_NAME"
provider = google-beta
}
Replace the following:
- PROJECT_ID: the ID of your Google Cloud project
- LOCATION: the region where the data product exists
- DATA_PRODUCT_ID: the ID of the data product
- DATA_ASSET_ID: a unique ID for this data asset within the data product
- RESOURCE_NAME: the
full resource name of the data asset (for
example,
//bigquery.googleapis.com/projects/PROJECT_ID/datasets/DATASET_ID/tables/TABLE_ID)
Optional: Configure access groups and asset permissions
In the Configure access groups and asset permissions pane, you can create access groups and assign permissions to assets.
Configure access groups
Console
Click Add access group.
In the Access group name field, enter a name for the access group. For example,
Analyst.In the Access group description field, enter a description for the access group.
In the Access group identifier field, enter the email address of a Google Group that you want to assign to this access group
Data product consumers who request access for themselves are added as members to the mapped Google Group.
For more information about creating Google Groups, see Create and manage Google Groups in the Google Cloud console.
In the Access group service account field, enter the email address of a service account that you want to assign to this access group.
Data product consumers who request access for their service accounts are granted the Service Account Token Creator (
roles/iam.serviceAccountTokenCreator) IAM role to impersonate the data producer service account mapped to the access group.For more information about creating service accounts, see Create service accounts.
Click Done.
To add another access group, click Add access group and repeat the steps.
You can add a maximum of three access groups per data product.
Click Save.
REST
To configure an access group for the data product, use the
dataProducts.patch
method.
For example, send the following PATCH request:
curl -X PATCH \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-d '{"access_groups": ACCESS_GROUPS_MAP}' \
https://dataplex.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/dataProducts/DATA_PRODUCT_ID?update_mask="access_groups"
Replace the following:
ACCESS_GROUPS_MAP: a JSON object representing a map where each key is an access group ID and the value is an AccessGroup object. For example:
{ "analyst": { "id": "analyst", "display_name": "Analyst access group", "description": "Access group for analysts", "principal": { "google_group": "analyst-team@example.com", "service_account": "analyst-svc@gserviceaccount.com" } }PROJECT_ID: the ID of your Google Cloud project
LOCATION: the region where the data product exists
DATA_PRODUCT_ID: the ID of your data product
Terraform
To define access groups for your data product, use the access_groups nested
block within the
google_dataplex_data_product
resource.
For example, use the following configuration:
resource "google_dataplex_data_product" "example_data_product" {
project = "PROJECT_ID"
location = "LOCATION"
data_product_id = "DATA_PRODUCT_ID"
display_name = "DISPLAY_NAME"
owner_emails = ["EMAIL_IDs"]
access_groups {
id = "analyst" # Internal identifier for configuration
group_id = "analyst" # Unique identifier of the access group, should be same as the 'id'
display_name = "Business Analyst"
description = "Access group for regional analysts"
principal {
google_group = "analyst-team@example.com"
}
provider = google-beta
}
Replace the following:
- PROJECT_ID: the ID of your Google Cloud project
- LOCATION: the region where the data product exists
- DATA_PRODUCT_ID: a unique ID for the data product
- DISPLAY_NAME: a user-friendly name for your data product
- EMAIL_IDs: comma-separated email addresses of
the data product owners, for
example—
["user1@example.com", "user2@example.com"]
Configure asset permissions
After you configure access groups, you can configure permissions for the assets in the data product.
Console
In the Asset permissions section, select the asset for which you want to configure permissions. You can select and configure permissions for up to 10 assets at a time.
Click Configure permissions.
In the Select access group field, select an access group.
In the Assign IAM role field, select an IAM role that you want to assign to the access group.
For example, if your asset is a BigQuery table named
Sales, and if you selectedAnalystaccess group, and assignedBigQuery Metadata Viewerrole to this access group, the data product consumers who are part of theAnalystaccess group haveBigQuery Metadata Viewerpermission on theSalestable.You can add multiple roles to an asset.
Click Configure. The asset now shows its assigned permissions.
To configure permissions for other assets, repeat the steps.
Click Continue.
REST
To configure permissions for the assets in the data product, use the
dataAssets.patch
method.
For example, send the following PATCH request:
curl -X PATCH \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-d '{"access_group_configs": ACCESS_GROUP_CONFIGS_MAP}' \
https://dataplex.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/dataProducts/DATA_PRODUCT_ID/dataAssets/DATA_ASSET_ID?update_mask="access_group_configs"
Replace the following:
ACCESS_GROUP_CONFIGS_MAP: a JSON object representing a map where each key is an access group ID and the value is an AccessGroupConfig object. For example:
{ "analyst": { iam_roles: ["roles/bigquery.dataViewer"] } }PROJECT_ID: the ID of your Google Cloud project
LOCATION: the region where the data product exists
DATA_PRODUCT_ID: the ID of your data product
DATA_ASSET_ID: the ID of the asset for which you want to configure permissions
Terraform
Assign IAM roles to your access groups for specific assets
using the access_group_configs block in the
google_dataplex_data_product_data_asset
resource.
For example, use the following configuration:
resource "google_dataplex_data_product_data_asset" "example_data_asset" {
project = "PROJECT_ID"
location = "LOCATION"
data_product_id = "DATA_PRODUCT_ID"
data_asset_id = "DATA_ASSET_ID"
resource = "RESOURCE_NAME"
access_group_configs {
access_group = "analyst" # Must match the 'id' defined in google_dataplex_data_product
iam_roles = ["roles/bigquery.dataViewer"]
}
provider = google-beta
}
Replace the following:
- PROJECT_ID: the ID of your Google Cloud project
- LOCATION: the region where the data product exists
- DATA_PRODUCT_ID: the ID of the data product
- DATA_ASSET_ID: a unique ID for this data asset within the data product
- RESOURCE_NAME: the
full resource name of the data asset (for
example,
//bigquery.googleapis.com/projects/PROJECT_ID/datasets/DATASET_ID/tables/TABLE_ID)
Optional: Add contract and aspect details
You can add contracts and aspects for a data product.
Add a contract
To establish a foundation of trust between data producers and consumers, you can attach a contract to your data product. By specifying parameters such as refresh time and thresholds, you provide consumers with the necessary context to understand when the data is updated and whether it meets their specific business requirements.
Console
In the Add contract and aspect details pane, click Add contract.
In the Select contract field, select
Refresh cadence.In the Frequency field, select an agreed-upon schedule for how often data is updated or delivered, ensuring a predictable flow from data producer to data consumer. For example,
Weekly.In the Refresh time field, enter a maximum acceptable time when data is updated at its source and when it becomes available to the consumer. For example,
23:00 PST.In the Threshold (in minutes) field, enter a measurable limit in minutes for the acceptable delay in data delivery. For example, enter
30to set a threshold of 30 minutes.Optional: In the Cron schedule field, enter a cron expression that defines the schedule for data generation and delivery in the format:
MINUTE HOUR DAY_OF_MONTH MONTH DAY_OF_WEEKThe following are the accepted values:
- MINUTE:
0-59 - HOUR:
0-23 - DAY_OF_MONTH:
1-31 - MONTH:
1-31orJAN-DEC - DAY_OF_WEEK:
0-6orSUN-SAT
For example,
0 8 * * 1-5runs at 8:00 AM on weekdays (Monday-Friday).- MINUTE:
Click Save.
REST
Contracts are modeled as
aspects on the data product.
To add a Refresh Cadence contract for a data product, use the
entries.patch
method.
For example, send the following PATCH request:
curl -X PATCH \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-d \
'{
"aspects": {
"dataplex-types.global.refresh-cadence": {
"aspectType": "projects/dataplex-types/locations/global/aspectTypes/refresh-cadence",
"data": {
"frequency": "REFRESH_FREQUENCY"
}
}
}
}' \
"https://dataplex.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/entryGroups/@dataplex/entries/projects/DATA_PRODUCT_PROJECT_NUMBER/locations/DATA_PRODUCT_LOCATION/dataProducts/DATA_PRODUCT_ID?updateMask=aspects"
Replace the following:
- REFRESH_FREQUENCY: the agreed-upon schedule for
how often data is updated or delivered, ensuring a predictable flow from
data producer to data consumer. For example:
Weekly - PROJECT_ID: the ID of your Google Cloud project where the API call is being made
- LOCATION: the region of the Knowledge Catalog
service endpoint you are calling (for example,
us-central1) - DATA_PRODUCT_PROJECT_NUMBER: the project number where the data product resource is located
- DATA_PRODUCT_LOCATION: the location of the data product resource
- DATA_PRODUCT_ID: the ID of your data product
Terraform
Contracts are modeled as
aspects on the data product.
To manage a contract, you must manage the underlying Knowledge Catalog
entry. Because Terraform doesn't automatically discover existing aspects, you
must first import the
google_dataplex_entry.
To import the entry, use the following command:
terraform import google_dataplex_entry.data_product_metadata "projects/DATA_PRODUCT_PROJECT_NUMBER/locations/LOCATION/entryGroups/@dataplex/entries/projects/DATA_PRODUCT_PROJECT_NUMBER/locations/LOCATION/dataProducts/DATA_PRODUCT_ID"
Terraform configuration:
resource "google_dataplex_entry" "data_product_metadata" {
project = "DATA_PRODUCT_PROJECT_NUMBER"
location = "LOCATION"
entry_group_id = "@dataplex"
entry_id = "projects/DATA_PRODUCT_PROJECT_NUMBER/locations/LOCATION/dataProducts/DATA_PRODUCT_ID"
entry_type = "projects/655216118709/locations/global/entryTypes/data-product"
aspects {
aspect_key = "655216118709.global.refresh-cadence"
aspect {
data = jsonencode({
frequency = "REFRESH_FREQUENCY"
})
}
}
provider = google-beta
}
Replace the following:
- DATA_PRODUCT_PROJECT_NUMBER: the project number where the data product resource is located
- LOCATION: the region of the Knowledge Catalog
service endpoint you are calling (for example,
us-central1) - DATA_PRODUCT_ID: the ID of your data product
- REFRESH_FREQUENCY: the agreed-upon schedule for
how often data is updated or delivered, ensuring a predictable flow from data
producer to data consumer. For example:
Weekly
For general information on the import process, refer to the Terraform import documentation.
Add aspects
Use aspects to enrich your data product with structured, reusable metadata. These templates provide a standardized way for data producers to communicate the quality and fitness of a data product, improving governance and helping consumers determine if the product meets their business needs.
To add aspects for the data product, follow these steps:
Console
In the Add contract and aspect details pane, click + Add aspect.
In the Select aspect type field, search for and select an aspect type from the list. For example,
Geo context.Click Save.
REST
To add aspects for a data product, use the
entries.patch
method.
For example, send the following PATCH request:
curl -X PATCH \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-d \
'{
"aspects": {
"ASPECT_PROJECT_ID.ASPECT_LOCATION.ASPECT_NAME": {
"aspectType": "projects/ASPECT_PROJECT_ID/locations/ASPECT_LOCATION/aspectTypes/ASPECT_NAME",
"data": {}
}
}
}' \
"https://dataplex.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/entryGroups/@dataplex/entries/projects/DATA_PRODUCT_PROJECT_NUMBER/locations/DATA_PRODUCT_LOCATION/dataProducts/DATA_PRODUCT_ID?updateMask=aspects"
Replace the following:
- ASPECT_PROJECT_ID: the ID of your Google Cloud project where the aspect is created
- ASPECT_LOCATION: the region of the
Knowledge Catalog service endpoint where the aspect is created
(for example,
us-central1) - ASPECT_NAME: the name of the aspect you want to attach to the data product
- PROJECT_ID: the ID of your Google Cloud project where the API call is being made
- LOCATION: the region of the Knowledge Catalog
service endpoint you are calling (for example,
us-central1) - DATA_PRODUCT_PROJECT_NUMBER: the project number where the data product resource is located
- DATA_PRODUCT_LOCATION: the location of the data product resource
- DATA_PRODUCT_ID: the ID of your data product
Terraform
To manage aspects, you must manage the underlying Knowledge Catalog entry.
Because Terraform doesn't automatically discover existing aspects, you must
first import the
google_dataplex_entry.
To import the entry, use the following command:
terraform import google_dataplex_entry.data_product_metadata "projects/DATA_PRODUCT_PROJECT_NUMBER/locations/LOCATION/entryGroups/@dataplex/entries/projects/DATA_PRODUCT_PROJECT_NUMBER/locations/LOCATION/dataProducts/DATA_PRODUCT_ID"
Terraform configuration:
resource "google_dataplex_entry" "data_product_metadata" {
project = "DATA_PRODUCT_PROJECT_NUMBER"
location = "LOCATION"
entry_group_id = "@dataplex"
entry_id = "projects/DATA_PRODUCT_PROJECT_NUMBER/locations/LOCATION/dataProducts/DATA_PRODUCT_ID"
entry_type = "projects/655216118709/locations/global/entryTypes/data-product"
aspects {
aspect_key = "ASPECT_PROJECT_NUMBER.ASPECT_LOCATION.ASPECT_NAME"
aspect {
data = {}
}
}
provider = google-beta
}
Replace the following:
- DATA_PRODUCT_PROJECT_NUMBER: the project number where the data product resource is located
- LOCATION: the region of the Knowledge Catalog
service endpoint you are calling (for example,
us-central1) - DATA_PRODUCT_ID: the ID of your data product
- ASPECT_PROJECT_NUMBER: the Google Cloud project number where the aspect is created
- ASPECT_LOCATION: the region of the
Knowledge Catalog service endpoint where the aspect is created
(for example,
us-central1) - ASPECT_NAME: the name of the aspect you want to attach to the data product
For general information on the import process, refer to the Terraform import documentation.
Optional: Add additional details
You can add documentation and sample queries for your data product to provide
essential context, business logic descriptions, and user guides. In
Knowledge Catalog, documentation is managed through the overview
system aspect.
You can manually create this documentation or use Knowledge Catalog data insights to automatically generate it.
Manually add documentation and sample queries
Console
To add documentation for your data product, follow these steps:
In the Add additional details pane, click Edit next to Documentation.
Type in the content in the rich-text editor.
Click Save.
To add sample queries for your data product, follow these steps:
In the Add additional details pane, click Add queries in the Query recommendation section.
Type the sample queries.
Click Save.
The newly created data product appears on the Knowledge Catalog Data products page.
REST
Documentation is modeled as
aspects on the data product.
To add documentation, use the
entries.patch
method.
For example, send the following PATCH request:
curl -X PATCH \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-d \
'{
"aspects": {
"dataplex-types.global.overview": {
"aspectType": "projects/dataplex-types/locations/global/aspectTypes/overview",
"data": {
"content": "DOCUMENTATION"
}
}
}
}' \
"https://dataplex.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/entryGroups/@dataplex/entries/projects/DATA_PRODUCT_PROJECT_NUMBER/locations/DATA_PRODUCT_LOCATION/dataProducts/DATA_PRODUCT_ID?updateMask=aspects"
Replace the following:
- PROJECT_ID: the ID of your Google Cloud project where the API call is being made
- LOCATION: the region of the Knowledge Catalog
service endpoint you are calling (for example,
us-central1) - DATA_PRODUCT_PROJECT_NUMBER: the project number where the data product resource is located
- DATA_PRODUCT_LOCATION: the location of the data product resource
- DATA_PRODUCT_ID: the ID of your data product
- DOCUMENTATION: the content that you want to attach to the data product
Terraform
Documentation is modeled as
aspects on the data product.
To manage documentation, you must manage the underlying Knowledge Catalog
entry. Because Terraform doesn't automatically discover existing aspects, you
must first import the
google_dataplex_entry.
To import the entry, use the following command:
terraform import google_dataplex_entry.data_product_metadata "projects/DATA_PRODUCT_PROJECT_NUMBER/locations/LOCATION/entryGroups/@dataplex/entries/projects/DATA_PRODUCT_PROJECT_NUMBER/locations/LOCATION/dataProducts/DATA_PRODUCT_ID"
Terraform configuration:
resource "google_dataplex_entry" "data_product_metadata" {
project = "DATA_PRODUCT_PROJECT_NUMBER"
location = "LOCATION"
entry_group_id = "@dataplex"
entry_id = "projects/DATA_PRODUCT_PROJECT_NUMBER/locations/LOCATION/dataProducts/DATA_PRODUCT_ID"
entry_type = "projects/655216118709/locations/global/entryTypes/data-product"
aspects {
aspect_key = "655216118709.global.overview"
aspect {
data = jsonencode({
content = "DOCUMENTATION"
})
}
}
provider = google-beta
}
Replace the following:
- DATA_PRODUCT_PROJECT_NUMBER: the project number where the data product resource is located
- LOCATION: the region of the Knowledge Catalog
service endpoint you are calling (for example,
us-central1) - DATA_PRODUCT_ID: the ID of your data product
- DOCUMENTATION: the content that you want to attach to the data product
For general information on the import process, refer to the Terraform import documentation.
Generate automated documentation and sample queries using data insights
Before you generate documentation and sample queries using Gemini, complete the following prerequisites:
Enable the Gemini for Google Cloud API in the project where you create the data product.
Grant insight-specific user roles: Ask your administrator to grant your identity the following roles and permissions on the data product project:
- Generate and manage data insights: Dataplex DataScan Editor
(
roles/dataplex.dataScanEditor) or Dataplex DataScan Administrator (roles/dataplex.dataScanAdmin) on the project where data product resides - View generated insights: Dataplex DataScan DataViewer
(
roles/dataplex.dataScanDataViewer) on the project where data product resides
- Generate and manage data insights: Dataplex DataScan Editor
(
Configure cross-project service agent permissions. If your underlying data assets reside in a Google Cloud project different from your data product project, you must grant the Knowledge Catalog service agent (P4SA) access to those assets:
To generate or retrieve the service agent identifier for your data product project, run the following Google Cloud CLI command:
gcloud beta services identity create --service=dataplex.googleapis.com --project=DATA_PRODUCT_PROJECT_IDReplace DATA_PRODUCT_PROJECT_ID with the Google Cloud project ID where your data product resides.
In each external project where your assets reside, grant the data product project's service agent the following roles:
BigQuery Data Editor (
roles/bigquery.dataEditor) on the underlying tables and datasetsBigQuery Studio Admin (
roles/bigquery.studioAdmin) on the asset project
To generate documentation and sample queries for your data product using data insights, complete the following steps:
Console
In the Add additional details pane, on the Generate insights with Gemini bar, and click Generate.
Wait for a few minutes for the insight generation process to complete.
To review the generated content, click View.
Evaluate the generated content:
If the content is accurate, click Save. This populates the rich-text editor with a predefined documentation template and adds sample queries to the Insights section.
If the content doesn't meet expectations, click Discard.
Click Save to finalize.
REST
To automatically generate, retrieve, and apply documentation and insights using the API, execute the following series of Knowledge Catalog DataScans API calls.
Generate automated documentation.
To trigger the automated documentation generation, create a
DATA_DOCUMENTATIONtype data scan by sending aPOSTrequest to thedataScansendpoint:curl -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ -d '{ "data": { "resource": "DATA_PRODUCT_RESOURCE_NAME" }, "executionSpec": { "trigger": { "oneTime": { "ttl_after_scan_completion": "TTL" } } }, "type": "DATA_DOCUMENTATION", "dataDocumentationSpec": {} }' \ "https://dataplex.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/dataScans?data_scan_id=DATA_SCAN_ID"Replace the following:
- DATA_PRODUCT_RESOURCE_NAME: the full resource name of the target data product to scan.
- TTL: the duration in seconds after
which the scan resource should be automatically deleted (for example,
3600for one hour). If not specified, the default value is 24 hours. The maximum allowed value is 365 days (31536000seconds). - PROJECT_ID: the ID of your Google Cloud project
- LOCATION: the region where the data scan runs
- DATA_SCAN_ID: a unique ID you provide for this scan
Retrieve the generated documentation.
After the data scan job completes, retrieve the generated documentation and query insights by sending a
GETrequest with theview=fullparameter:curl -X GET \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ "https://dataplex.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/dataScans/DATA_SCAN_ID?view=full"Save the generated queries to the data product.
Extract the generated SQL snippets from the data scan output in the previous step, and attach them to your data product entry by updating its
queriesaspect through aPATCHrequest:curl -X PATCH \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ -d '{ "aspects": { "dataplex-types.global.queries": { "aspectType": "projects/dataplex-types/locations/global/aspectTypes/queries", "data": { "queries": [ { "description": "QUERY_DESCRIPTION", "sql": "SQL_STATEMENT", "source": "USER" } ] } } } }' \ "https://dataplex.googleapis.com/v1/projects/CATALOG_PROJECT_ID/locations/CATALOG_LOCATION/entryGroups/@dataplex/entries/projects/DATA_PRODUCT_PROJECT_NUMBER/locations/DATA_PRODUCT_LOCATION/dataProducts/DATA_PRODUCT_ID?updateMask=aspects"Replace the following:
QUERY_DESCRIPTION: a description explaining what the recommended sample query accomplishes
SQL_STATEMENT: the literal text of the generated SQL sample query
CATALOG_PROJECT_ID: the ID of the Google Cloud project where you are making the API call
CATALOG_LOCATION: the regional endpoint for the Knowledge Catalog service (for example,
us-central1)DATA_PRODUCT_PROJECT_NUMBER: the project number where the data product resource is hosted
DATA_PRODUCT_LOCATION: the location of your data product resource
DATA_PRODUCT_ID: the ID of your data product
What's next
- Learn more about managing data products.
- Learn how to search for data products.
- As a data consumer, learn how to request access to data products.