Create data products

This document is intended for data product owners who want to create and configure data products in Knowledge Catalog (formerly Dataplex Universal Catalog).

For more information about the architecture and key concepts of data products, see About data products.

Before you begin

Before you create data products, complete the following prerequisites.

Enable Gemini

Configuring Gemini in your data asset is an optional but highly recommended step before you create your first data product.

By default, creating a data product requires you to manually enter business descriptions, technical definitions, and onboarding documentation for your assets. When you enable Gemini integration, Knowledge Catalog leverages AI assistance to automatically analyze your schemas and data scan results to generate the following:

  • Business documentation: Generates documentation templates and clear descriptions for your data product and its individual data assets.
  • Insights and sample queries: Constructs ready-to-use sample queries based on the asset's schema layout, enabling data consumers to immediately start querying the product upon approval.

If you choose not to enable Gemini, you can skip this section. However, you must manually provide all asset metadata and query templates during creation.

For more information about enabling Gemini in BigQuery, see Set up Gemini in BigQuery.

Enable APIs

Enable the Dataplex, BigQuery APIs.

Roles required to enable APIs

To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.

Enable the APIs

Create data assets

Ensure that your data assets (for example, BigQuery datasets, tables, and views) are created and populated.

For more information about creating data assets, see the following documents:

Configure identities

Identify or create the Google Groups or service accounts that you want to configure in your data product.

Required roles

This section outlines the minimum IAM roles required for the following primary sections:

  • Data product owners: users who create, configure, and manage data products and their associated assets

  • Data product consumers: users who search for, view, and request access to published data products

Required roles for data product owners

To get the permissions that you need to create and manage data products, ask your administrator to grant you the following IAM roles on the project:

  • Full permissions to create, update, delete, manage permissions, and approve or reject access requests for data products: Dataplex Data Products Admin (roles/dataplex.dataProductsAdmin)
  • Update and manage permissions, and approve or reject access requests for data products: Dataplex Data Products Editor (roles/dataplex.dataProductsEditor)
  • Add metadata aspects (such as schema, overview, contacts, and queries): Dataplex Entry and EntryLink Owner (roles/dataplex.entryOwner)
  • Search for and add assets: Dataplex Catalog Viewer (roles/dataplex.catalogViewer)
  • Edit system aspect types (such as overview, contact, contract, and queries): Dataplex Catalog Editor (roles/dataplex.catalogEditor)
  • Create or retrieve insights data scans for automated documentation and insights generation: Dataplex DataScan Admin (roles/dataplex.dataScanAdmin)

For more information about granting roles, see Manage access to projects, folders, and organizations.

These predefined roles contain the permissions required to create and manage data products. To see the exact permissions that are required, expand the Required permissions section:

Required permissions

The following permissions are required to create and manage data products:

  • Create a data product: dataplex.dataProducts.create
  • List data products in a project: dataplex.dataProducts.list
  • Get or view data product: dataplex.dataProducts.get
  • Edit an existing data product: dataplex.dataProducts.update
  • Delete data product: dataplex.dataProducts.delete
  • Approve data product access request: dataplex.dataProducts.approve
  • Search for a data product using Knowledge Catalog:
    • dataplex.dataProducts.get
    • dataplex.projects.search
  • Create data product access request: dataplex.dataProducts.get
  • Create a data asset: dataplex.dataAssets.create
  • List data assets within a data product: dataplex.dataAssets.list
  • Get data asset: dataplex.dataAssets.get
  • Edit an existing data asset: dataplex.dataAssets.update
  • Delete data asset: dataplex.dataAssets.delete
  • Create a data scan: dataplex.datascans.create
  • List all data scans: dataplex.datascans.list
  • Get a data scan: dataplex.datascans.get
  • Run a data scan: dataplex.datascans.run
  • Edit the overview system aspect type: dataplex.entryGroups.useOverviewAspect
  • Edit the refresh cadence system aspect type: dataplex.entryGroups.useRefreshCadenceAspect
  • Edit the queries system aspect type: dataplex.entryGroups.useQueriesAspect

You might also be able to get these permissions with custom roles or other predefined roles.

Required roles for data product consumers

For the data product consumers to search for, view, and request access to data products, as a data product owner, you must ensure the data product is discoverable. To do this, grant the data product consumers the following IAM roles on the data product:

  • Search for data products and request access to them: Dataplex Data Product Consumer (dataplex.dataProductsConsumer) and Dataplex Catalog Viewer (roles/dataplex.catalogViewer)
  • Read-only access to view data product definitions and metadata: Dataplex Data Product Viewer (dataplex.dataProductsViewer)

Create and configure a data product

Creating a data product involves the following high-level tasks:

  1. Create a data product

    This mandatory initial step requires defining core details such as a unique data product name, description, region where the data product is created, and contact details.

  2. Optional: Add assets

    In this phase, you select assets to include in the data product. A key constraint is that assets must reside in the same region as the data product itself. You can add up to 10 assets at a time, with a total maximum of 50 assets allowed per data product.

    For the list of supported assets, see Assets supported.

  3. Optional: Configure access groups and asset permissions

    In this optional phase, you simplify access control by creating access groups. These access groups act as user-friendly aliases (for example, Analyst or Reader) for underlying Google Groups and service accounts. You then assign permissions by selecting a specific IAM role and mapping it to an access group for a specific asset.

  4. Optional: Add contract and aspect details

    In this phase, you enhance governance and data discoverability by attaching metadata frameworks. You can add a contract to formally communicate your data refresh cadence, specifying parameters such as refresh frequency, timing, and variance thresholds. You can also attach custom aspects to provide additional business or technical metadata for your data product.

  5. Optional: Add additional details

    In this final phase, you add rich text documentation, such as user onboarding guides, business definitions, and sample queries, to help consumers interact with the data product immediately upon approval.

To create and configure a data product, complete the steps in the following sections:

Create a data product

Console

  1. In the Google Cloud console, go to the Knowledge Catalog Data products page.

    Go to Data products

  2. Click Create.

  3. In the Create data products pane, enter the following details:

    • Data product name: Enter a unique name for your data product.
    • Data product ID: This is an auto-generated unique identifier. You can edit this field.
    • Project ID: This is a unique identifier of the project where the data product is created. Browse and select the project.
    • Region: Select the region or multi-region where the data product is created.
    • Data product icon: Browse and select an icon to visually identify the data product. This is optional.
    • Description: Enter a brief description of the data product.
    • Contacts: Provide the point of contact information for governance and approval workflows:

      • Data product owner(s) email address: Enter the email address of the data product owners.
      • Data product approver(s) email address: Enter the email address of the designated approvers responsible for signing off on access requests or modifications.
    • Labels: Add key-value labels to organize your resources. This is optional.

  4. Click Create data product.

Terraform

To create a data product, use the google_dataplex_data_product resource.

resource "google_dataplex_data_product" "example_product" {
project         = "PROJECT_ID"
location        = "LOCATION"
data_product_id = "DATA_PRODUCT_ID"
display_name    = "DISPLAY_NAME"
description     = "DESCRIPTION"
owner_emails    = ["EMAIL_IDs"]

provider = google-beta
}

Replace the following:

  • PROJECT_ID: the ID of your Google Cloud project
  • LOCATION: the region in which you want to create the data product
  • DATA_PRODUCT_ID: a unique ID for your data product
  • DISPLAY_NAME: a user-friendly name for your data product
  • DESCRIPTION: a brief description of the data product
  • EMAIL_IDs: comma-separated email addresses of the data product owners, for example—["user1@example.com", "user2@example.com"]

C#

C#

Before trying this sample, follow the C# setup instructions in the Knowledge Catalog quickstart using client libraries. For more information, see the Knowledge Catalog C# API reference documentation.

To authenticate to Knowledge Catalog, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

using Google.Api.Gax.ResourceNames;
using Google.Cloud.Dataplex.V1;
using Google.LongRunning;

public sealed partial class GeneratedDataProductServiceClientSnippets
{
    /// <summary>Snippet for CreateDataProduct</summary>
    /// <remarks>
    /// This snippet has been automatically generated and should be regarded as a code template only.
    /// It will require modifications to work:
    /// - It may require correct/in-range values for request initialization.
    /// - It may require specifying regional endpoints when creating the service client as shown in
    ///   https://cloud.google.com/dotnet/docs/reference/help/client-configuration#endpoint.
    /// </remarks>
    public void CreateDataProductRequestObject()
    {
        // Create client
        DataProductServiceClient dataProductServiceClient = DataProductServiceClient.Create();
        // Initialize request argument(s)
        CreateDataProductRequest request = new CreateDataProductRequest
        {
            ParentAsLocationName = LocationName.FromProjectLocation("[PROJECT]", "[LOCATION]"),
            DataProductId = "",
            DataProduct = new DataProduct(),
            ValidateOnly = false,
        };
        // Make the request
        Operation<DataProduct, OperationMetadata> response = dataProductServiceClient.CreateDataProduct(request);

        // Poll until the returned long-running operation is complete
        Operation<DataProduct, OperationMetadata> completedResponse = response.PollUntilCompleted();
        // Retrieve the operation result
        DataProduct result = completedResponse.Result;

        // Or get the name of the operation
        string operationName = response.Name;
        // This name can be stored, then the long-running operation retrieved later by name
        Operation<DataProduct, OperationMetadata> retrievedResponse = dataProductServiceClient.PollOnceCreateDataProduct(operationName);
        // Check if the retrieved long-running operation has completed
        if (retrievedResponse.IsCompleted)
        {
            // If it has completed, then access the result
            DataProduct retrievedResult = retrievedResponse.Result;
        }
    }
}

Go

Go

Before trying this sample, follow the Go setup instructions in the Knowledge Catalog quickstart using client libraries. For more information, see the Knowledge Catalog Go API reference documentation.

To authenticate to Knowledge Catalog, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.


//go:build examples

package main

import (
	"context"

	dataplex "cloud.google.com/go/dataplex/apiv1"
	dataplexpb "cloud.google.com/go/dataplex/apiv1/dataplexpb"
)

func main() {
	ctx := context.Background()
	// This snippet has been automatically generated and should be regarded as a code template only.
	// It will require modifications to work:
	// - It may require correct/in-range values for request initialization.
	// - It may require specifying regional endpoints when creating the service client as shown in:
	//   https://pkg.go.dev/cloud.google.com/go#hdr-Client_Options
	c, err := dataplex.NewDataProductClient(ctx)
	if err != nil {
		// TODO: Handle error.
	}
	defer c.Close()

	req := &dataplexpb.CreateDataProductRequest{
		// TODO: Fill request struct fields.
		// See https://pkg.go.dev/cloud.google.com/go/dataplex/apiv1/dataplexpb#CreateDataProductRequest.
	}
	op, err := c.CreateDataProduct(ctx, req)
	if err != nil {
		// TODO: Handle error.
	}

	resp, err := op.Wait(ctx)
	if err != nil {
		// TODO: Handle error.
	}
	// TODO: Use resp.
	_ = resp
}

Java

Java

Before trying this sample, follow the Java setup instructions in the Knowledge Catalog quickstart using client libraries. For more information, see the Knowledge Catalog Java API reference documentation.

To authenticate to Knowledge Catalog, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

import com.google.cloud.dataplex.v1.CreateDataProductRequest;
import com.google.cloud.dataplex.v1.DataProduct;
import com.google.cloud.dataplex.v1.DataProductServiceClient;
import com.google.cloud.dataplex.v1.LocationName;

public class SyncCreateDataProduct {

  public static void main(String[] args) throws Exception {
    syncCreateDataProduct();
  }

  public static void syncCreateDataProduct() throws Exception {
    // This snippet has been automatically generated and should be regarded as a code template only.
    // It will require modifications to work:
    // - It may require correct/in-range values for request initialization.
    // - It may require specifying regional endpoints when creating the service client as shown in
    // https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
    try (DataProductServiceClient dataProductServiceClient = DataProductServiceClient.create()) {
      CreateDataProductRequest request =
          CreateDataProductRequest.newBuilder()
              .setParent(LocationName.of("[PROJECT]", "[LOCATION]").toString())
              .setDataProductId("dataProductId1437828576")
              .setDataProduct(DataProduct.newBuilder().build())
              .setValidateOnly(true)
              .build();
      DataProduct response = dataProductServiceClient.createDataProductAsync(request).get();
    }
  }
}

Node.js

Node.js

Before trying this sample, follow the Node.js setup instructions in the Knowledge Catalog quickstart using client libraries. For more information, see the Knowledge Catalog Node.js API reference documentation.

To authenticate to Knowledge Catalog, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

/**
 * This snippet has been automatically generated and should be regarded as a code template only.
 * It will require modifications to work.
 * It may require correct/in-range values for request initialization.
 * TODO(developer): Uncomment these variables before running the sample.
 */
/**
 *  Required. The parent resource where this data product will be created.
 *  Format: projects/{project_id_or_number}/locations/{location_id}
 */
// const parent = 'abc123'
/**
 *  Optional. The ID of the data product to create.
 *  The ID must conform to RFC-1034 and contain only lower-case letters (a-z),
 *  numbers (0-9), or hyphens, with the first character a letter, the last a
 *  letter or a number, and a 63 character maximum. Characters outside of
 *  ASCII are not permitted.
 *  Valid format regex: `^[a-z]([a-z0-9-]{0,61}[a-z0-9])?$`
 *  If not provided, a system generated ID will be used.
 */
// const dataProductId = 'abc123'
/**
 *  Required. The data product to create.
 */
// const dataProduct = {}
/**
 *  Optional. Validates the request without actually creating the data product.
 *  Default: false.
 */
// const validateOnly = true

// Imports the Dataplex library
const {DataProductServiceClient} = require('@google-cloud/dataplex').v1;

// Instantiates a client
const dataplexClient = new DataProductServiceClient();

async function callCreateDataProduct() {
  // Construct request
  const request = {
    parent,
    dataProduct,
  };

  // Run request
  const [operation] = await dataplexClient.createDataProduct(request);
  const [response] = await operation.promise();
  console.log(response);
}

callCreateDataProduct();

Python

Python

Before trying this sample, follow the Python setup instructions in the Knowledge Catalog quickstart using client libraries. For more information, see the Knowledge Catalog Python API reference documentation.

To authenticate to Knowledge Catalog, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

# This snippet has been automatically generated and should be regarded as a
# code template only.
# It will require modifications to work:
# - It may require correct/in-range values for request initialization.
# - It may require specifying regional endpoints when creating the service
#   client as shown in:
#   https://googleapis.dev/python/google-api-core/latest/client_options.html
from google.cloud import dataplex_v1


def sample_create_data_product():
    # Create a client
    client = dataplex_v1.DataProductServiceClient()

    # Initialize request argument(s)
    data_product = dataplex_v1.DataProduct()
    data_product.display_name = "display_name_value"
    data_product.owner_emails = ["owner_emails_value1", "owner_emails_value2"]

    request = dataplex_v1.CreateDataProductRequest(
        parent="parent_value",
        data_product=data_product,
    )

    # Make the request
    operation = client.create_data_product(request=request)

    print("Waiting for operation to complete...")

    response = operation.result()

    # Handle the response
    print(response)

REST

To create a data product, use the dataProducts.create method.

For example, send the following POST request:

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-d '{"display_name": "DISPLAY_NAME", "owner_emails": ["EMAIL_IDs"], "access_approval_config": { "approver_emails": ["APPROVER_EMAIL_IDs"]} }' \
https://dataplex.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/dataProducts?data_product_id=DATA_PRODUCT_ID

Replace the following:

  • DISPLAY_NAME: a user-friendly name for your data product
  • EMAIL_IDs: comma-separated email addresses of the data product owners
  • APPROVER_EMAIL_IDs: comma-separated email addresses of the designated approvers responsible for signing off on access requests or modifications.
  • PROJECT_ID: the ID of your Google Cloud project
  • LOCATION: the region in which you want to create the data product
  • DATA_PRODUCT_ID: a unique ID for your data product

Optional: Add assets

You can add various data assets, such as BigQuery tables, views, datasets, and models to your data product. For the list of supported assets, see Assets supported.

Console

  1. In the Add assets pane, click +Add.

  2. Search for and select the assets that you want to add to your data product. The assets you select must reside in the same region as the data product.

    If you have necessary permissions, you can view the metadata of assets by clicking the asset.

  3. To refine the search results, use Filters.

  4. After you select the assets, click Add.

  5. Click Continue.

Terraform

To add a data asset to your data product, use the google_dataplex_data_product_data_asset resource.

resource "google_dataplex_data_product_data_asset" "example_asset" {
project         = "PROJECT_ID"
location        = "LOCATION"
data_product_id = "DATA_PRODUCT_ID"
data_asset_id   = "DATA_ASSET_ID"
resource        = "RESOURCE_NAME"

provider = google-beta
}

Replace the following:

  • PROJECT_ID: the ID of your Google Cloud project
  • LOCATION: the region where the data product exists
  • DATA_PRODUCT_ID: the ID of the data product
  • DATA_ASSET_ID: a unique ID for this data asset within the data product
  • RESOURCE_NAME: the full resource name of the data asset (for example, //bigquery.googleapis.com/projects/PROJECT_ID/datasets/DATASET_ID/tables/TABLE_ID)

C#

C#

Before trying this sample, follow the C# setup instructions in the Knowledge Catalog quickstart using client libraries. For more information, see the Knowledge Catalog C# API reference documentation.

To authenticate to Knowledge Catalog, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

using Google.Cloud.Dataplex.V1;
using Google.LongRunning;

public sealed partial class GeneratedDataProductServiceClientSnippets
{
    /// <summary>Snippet for CreateDataAsset</summary>
    /// <remarks>
    /// This snippet has been automatically generated and should be regarded as a code template only.
    /// It will require modifications to work:
    /// - It may require correct/in-range values for request initialization.
    /// - It may require specifying regional endpoints when creating the service client as shown in
    ///   https://cloud.google.com/dotnet/docs/reference/help/client-configuration#endpoint.
    /// </remarks>
    public void CreateDataAssetRequestObject()
    {
        // Create client
        DataProductServiceClient dataProductServiceClient = DataProductServiceClient.Create();
        // Initialize request argument(s)
        CreateDataAssetRequest request = new CreateDataAssetRequest
        {
            ParentAsDataProductName = DataProductName.FromProjectLocationDataProduct("[PROJECT]", "[LOCATION]", "[DATA_PRODUCT]"),
            DataAssetId = "",
            DataAsset = new DataAsset(),
            ValidateOnly = false,
        };
        // Make the request
        Operation<DataAsset, OperationMetadata> response = dataProductServiceClient.CreateDataAsset(request);

        // Poll until the returned long-running operation is complete
        Operation<DataAsset, OperationMetadata> completedResponse = response.PollUntilCompleted();
        // Retrieve the operation result
        DataAsset result = completedResponse.Result;

        // Or get the name of the operation
        string operationName = response.Name;
        // This name can be stored, then the long-running operation retrieved later by name
        Operation<DataAsset, OperationMetadata> retrievedResponse = dataProductServiceClient.PollOnceCreateDataAsset(operationName);
        // Check if the retrieved long-running operation has completed
        if (retrievedResponse.IsCompleted)
        {
            // If it has completed, then access the result
            DataAsset retrievedResult = retrievedResponse.Result;
        }
    }
}

Go

Go

Before trying this sample, follow the Go setup instructions in the Knowledge Catalog quickstart using client libraries. For more information, see the Knowledge Catalog Go API reference documentation.

To authenticate to Knowledge Catalog, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.


//go:build examples

package main

import (
	"context"

	dataplex "cloud.google.com/go/dataplex/apiv1"
	dataplexpb "cloud.google.com/go/dataplex/apiv1/dataplexpb"
)

func main() {
	ctx := context.Background()
	// This snippet has been automatically generated and should be regarded as a code template only.
	// It will require modifications to work:
	// - It may require correct/in-range values for request initialization.
	// - It may require specifying regional endpoints when creating the service client as shown in:
	//   https://pkg.go.dev/cloud.google.com/go#hdr-Client_Options
	c, err := dataplex.NewDataProductClient(ctx)
	if err != nil {
		// TODO: Handle error.
	}
	defer c.Close()

	req := &dataplexpb.CreateDataAssetRequest{
		// TODO: Fill request struct fields.
		// See https://pkg.go.dev/cloud.google.com/go/dataplex/apiv1/dataplexpb#CreateDataAssetRequest.
	}
	op, err := c.CreateDataAsset(ctx, req)
	if err != nil {
		// TODO: Handle error.
	}

	resp, err := op.Wait(ctx)
	if err != nil {
		// TODO: Handle error.
	}
	// TODO: Use resp.
	_ = resp
}

Java

Java

Before trying this sample, follow the Java setup instructions in the Knowledge Catalog quickstart using client libraries. For more information, see the Knowledge Catalog Java API reference documentation.

To authenticate to Knowledge Catalog, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

import com.google.cloud.dataplex.v1.CreateDataAssetRequest;
import com.google.cloud.dataplex.v1.DataAsset;
import com.google.cloud.dataplex.v1.DataProductName;
import com.google.cloud.dataplex.v1.DataProductServiceClient;

public class SyncCreateDataAsset {

  public static void main(String[] args) throws Exception {
    syncCreateDataAsset();
  }

  public static void syncCreateDataAsset() throws Exception {
    // This snippet has been automatically generated and should be regarded as a code template only.
    // It will require modifications to work:
    // - It may require correct/in-range values for request initialization.
    // - It may require specifying regional endpoints when creating the service client as shown in
    // https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
    try (DataProductServiceClient dataProductServiceClient = DataProductServiceClient.create()) {
      CreateDataAssetRequest request =
          CreateDataAssetRequest.newBuilder()
              .setParent(DataProductName.of("[PROJECT]", "[LOCATION]", "[DATA_PRODUCT]").toString())
              .setDataAssetId("dataAssetId2108984609")
              .setDataAsset(DataAsset.newBuilder().build())
              .setValidateOnly(true)
              .build();
      DataAsset response = dataProductServiceClient.createDataAssetAsync(request).get();
    }
  }
}

Node.js

Node.js

Before trying this sample, follow the Node.js setup instructions in the Knowledge Catalog quickstart using client libraries. For more information, see the Knowledge Catalog Node.js API reference documentation.

To authenticate to Knowledge Catalog, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

/**
 * This snippet has been automatically generated and should be regarded as a code template only.
 * It will require modifications to work.
 * It may require correct/in-range values for request initialization.
 * TODO(developer): Uncomment these variables before running the sample.
 */
/**
 *  Required. The parent resource where this data asset will be created.
 *  Format:
 *  projects/{project_id_or_number}/locations/{location_id}/dataProducts/{data_product_id}
 */
// const parent = 'abc123'
/**
 *  Optional. The ID of the data asset to create.
 *  The ID must conform to RFC-1034 and contain only lower-case letters (a-z),
 *  numbers (0-9), or hyphens, with the first character a letter, the last a
 *  letter or a number, and a 63 character maximum. Characters outside of
 *  ASCII are not permitted.
 *  Valid format regex: `^[a-z]([a-z0-9-]{0,61}[a-z0-9])?$`
 *  If not provided, a system generated ID will be used.
 */
// const dataAssetId = 'abc123'
/**
 *  Required. The data asset to create.
 */
// const dataAsset = {}
/**
 *  Optional. Validates the request without actually creating the data asset.
 *  Defaults to false.
 */
// const validateOnly = true

// Imports the Dataplex library
const {DataProductServiceClient} = require('@google-cloud/dataplex').v1;

// Instantiates a client
const dataplexClient = new DataProductServiceClient();

async function callCreateDataAsset() {
  // Construct request
  const request = {
    parent,
    dataAsset,
  };

  // Run request
  const [operation] = await dataplexClient.createDataAsset(request);
  const [response] = await operation.promise();
  console.log(response);
}

callCreateDataAsset();

Python

Python

Before trying this sample, follow the Python setup instructions in the Knowledge Catalog quickstart using client libraries. For more information, see the Knowledge Catalog Python API reference documentation.

To authenticate to Knowledge Catalog, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

# This snippet has been automatically generated and should be regarded as a
# code template only.
# It will require modifications to work:
# - It may require correct/in-range values for request initialization.
# - It may require specifying regional endpoints when creating the service
#   client as shown in:
#   https://googleapis.dev/python/google-api-core/latest/client_options.html
from google.cloud import dataplex_v1


def sample_create_data_asset():
    # Create a client
    client = dataplex_v1.DataProductServiceClient()

    # Initialize request argument(s)
    data_asset = dataplex_v1.DataAsset()
    data_asset.resource = "resource_value"

    request = dataplex_v1.CreateDataAssetRequest(
        parent="parent_value",
        data_asset=data_asset,
    )

    # Make the request
    operation = client.create_data_asset(request=request)

    print("Waiting for operation to complete...")

    response = operation.result()

    # Handle the response
    print(response)

REST

To add a data asset to your data product, use the dataAssets.create method.

For example, send the following POST request:

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-d '{"resource": "RESOURCE_NAME"}' \
https://dataplex.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/dataProducts/DATA_PRODUCT_ID/dataAssets?data_asset_id=DATA_ASSET_ID

Replace the following:

  • RESOURCE_NAME: the full resource name of the data asset (for example, //bigquery.googleapis.com/projects/PROJECT_ID/datasets/DATASET_ID/tables/TABLE_ID)
  • PROJECT_ID: the ID of your Google Cloud project
  • LOCATION: the region where the data product exists
  • DATA_PRODUCT_ID: the ID of the data product
  • DATA_ASSET_ID: a unique ID for this data asset within the data product

Optional: Configure access groups and asset permissions

In the Configure access groups and asset permissions pane, you can create access groups and assign permissions to assets.

Configure access groups

Console

  1. Click Add access group.

  2. In the Access group name field, enter a name for the access group. For example, Analyst.

  3. In the Access group description field, enter a description for the access group.

  4. In the Access group identifier field, enter the email address of a Google Group that you want to assign to this access group

    Data product consumers who request access for themselves are added as members to the mapped Google Group.

    For more information about creating Google Groups, see Create and manage Google Groups in the Google Cloud console.

  5. In the Access group service account field, enter the email address of a service account that you want to assign to this access group.

    Data product consumers who request access for their service accounts are granted the Service Account Token Creator (roles/iam.serviceAccountTokenCreator) IAM role to impersonate the data producer service account mapped to the access group.

    For more information about creating service accounts, see Create service accounts.

  6. Click Done.

  7. To add another access group, click Add access group and repeat the steps.

    You can add a maximum of three access groups per data product.

  8. Click Save.

Terraform

To define access groups for your data product, use the access_groups nested block within the google_dataplex_data_product resource.

For example, use the following configuration:

resource "google_dataplex_data_product" "example_data_product" {
project         = "PROJECT_ID"
location        = "LOCATION"
data_product_id = "DATA_PRODUCT_ID"
display_name    = "DISPLAY_NAME"
owner_emails    = ["EMAIL_IDs"]

access_groups {
  id           = "analyst" # Internal identifier for configuration
  group_id     = "analyst" # Unique identifier of the access group, should be same as the 'id'
  display_name = "Business Analyst"
  description  = "Access group for regional analysts"
  principal {
    google_group = "analyst-team@example.com"
  }

provider = google-beta
}

Replace the following:

  • PROJECT_ID: the ID of your Google Cloud project
  • LOCATION: the region where the data product exists
  • DATA_PRODUCT_ID: a unique ID for the data product
  • DISPLAY_NAME: a user-friendly name for your data product
  • EMAIL_IDs: comma-separated email addresses of the data product owners, for example—["user1@example.com", "user2@example.com"]

C#

C#

Before trying this sample, follow the C# setup instructions in the Knowledge Catalog quickstart using client libraries. For more information, see the Knowledge Catalog C# API reference documentation.

To authenticate to Knowledge Catalog, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

using Google.Cloud.Dataplex.V1;
using Google.LongRunning;
using Google.Protobuf.WellKnownTypes;

public sealed partial class GeneratedDataProductServiceClientSnippets
{
    /// <summary>Snippet for UpdateDataProduct</summary>
    /// <remarks>
    /// This snippet has been automatically generated and should be regarded as a code template only.
    /// It will require modifications to work:
    /// - It may require correct/in-range values for request initialization.
    /// - It may require specifying regional endpoints when creating the service client as shown in
    ///   https://cloud.google.com/dotnet/docs/reference/help/client-configuration#endpoint.
    /// </remarks>
    public void UpdateDataProductRequestObject()
    {
        // Create client
        DataProductServiceClient dataProductServiceClient = DataProductServiceClient.Create();
        // Initialize request argument(s)
        UpdateDataProductRequest request = new UpdateDataProductRequest
        {
            DataProduct = new DataProduct(),
            UpdateMask = new FieldMask(),
            ValidateOnly = false,
        };
        // Make the request
        Operation<DataProduct, OperationMetadata> response = dataProductServiceClient.UpdateDataProduct(request);

        // Poll until the returned long-running operation is complete
        Operation<DataProduct, OperationMetadata> completedResponse = response.PollUntilCompleted();
        // Retrieve the operation result
        DataProduct result = completedResponse.Result;

        // Or get the name of the operation
        string operationName = response.Name;
        // This name can be stored, then the long-running operation retrieved later by name
        Operation<DataProduct, OperationMetadata> retrievedResponse = dataProductServiceClient.PollOnceUpdateDataProduct(operationName);
        // Check if the retrieved long-running operation has completed
        if (retrievedResponse.IsCompleted)
        {
            // If it has completed, then access the result
            DataProduct retrievedResult = retrievedResponse.Result;
        }
    }
}

Go

Go

Before trying this sample, follow the Go setup instructions in the Knowledge Catalog quickstart using client libraries. For more information, see the Knowledge Catalog Go API reference documentation.

To authenticate to Knowledge Catalog, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.


//go:build examples

package main

import (
	"context"

	dataplex "cloud.google.com/go/dataplex/apiv1"
	dataplexpb "cloud.google.com/go/dataplex/apiv1/dataplexpb"
)

func main() {
	ctx := context.Background()
	// This snippet has been automatically generated and should be regarded as a code template only.
	// It will require modifications to work:
	// - It may require correct/in-range values for request initialization.
	// - It may require specifying regional endpoints when creating the service client as shown in:
	//   https://pkg.go.dev/cloud.google.com/go#hdr-Client_Options
	c, err := dataplex.NewDataProductClient(ctx)
	if err != nil {
		// TODO: Handle error.
	}
	defer c.Close()

	req := &dataplexpb.UpdateDataProductRequest{
		// TODO: Fill request struct fields.
		// See https://pkg.go.dev/cloud.google.com/go/dataplex/apiv1/dataplexpb#UpdateDataProductRequest.
	}
	op, err := c.UpdateDataProduct(ctx, req)
	if err != nil {
		// TODO: Handle error.
	}

	resp, err := op.Wait(ctx)
	if err != nil {
		// TODO: Handle error.
	}
	// TODO: Use resp.
	_ = resp
}

Java

Java

Before trying this sample, follow the Java setup instructions in the Knowledge Catalog quickstart using client libraries. For more information, see the Knowledge Catalog Java API reference documentation.

To authenticate to Knowledge Catalog, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

import com.google.cloud.dataplex.v1.DataProduct;
import com.google.cloud.dataplex.v1.DataProductServiceClient;
import com.google.cloud.dataplex.v1.UpdateDataProductRequest;
import com.google.protobuf.FieldMask;

public class SyncUpdateDataProduct {

  public static void main(String[] args) throws Exception {
    syncUpdateDataProduct();
  }

  public static void syncUpdateDataProduct() throws Exception {
    // This snippet has been automatically generated and should be regarded as a code template only.
    // It will require modifications to work:
    // - It may require correct/in-range values for request initialization.
    // - It may require specifying regional endpoints when creating the service client as shown in
    // https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
    try (DataProductServiceClient dataProductServiceClient = DataProductServiceClient.create()) {
      UpdateDataProductRequest request =
          UpdateDataProductRequest.newBuilder()
              .setDataProduct(DataProduct.newBuilder().build())
              .setUpdateMask(FieldMask.newBuilder().build())
              .setValidateOnly(true)
              .build();
      DataProduct response = dataProductServiceClient.updateDataProductAsync(request).get();
    }
  }
}

Node.js

Node.js

Before trying this sample, follow the Node.js setup instructions in the Knowledge Catalog quickstart using client libraries. For more information, see the Knowledge Catalog Node.js API reference documentation.

To authenticate to Knowledge Catalog, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

/**
 * This snippet has been automatically generated and should be regarded as a code template only.
 * It will require modifications to work.
 * It may require correct/in-range values for request initialization.
 * TODO(developer): Uncomment these variables before running the sample.
 */
/**
 *  Required. The data product to update.
 *  The data product's `name` field is used to identify the data product to
 *  update.
 */
// const dataProduct = {}
/**
 *  Optional. The list of fields to update.
 *  If this is empty or not set, then all the fields will be updated.
 */
// const updateMask = {}
/**
 *  Optional. Validates the request without actually updating the data product.
 *  Default: false.
 */
// const validateOnly = true

// Imports the Dataplex library
const {DataProductServiceClient} = require('@google-cloud/dataplex').v1;

// Instantiates a client
const dataplexClient = new DataProductServiceClient();

async function callUpdateDataProduct() {
  // Construct request
  const request = {
    dataProduct,
  };

  // Run request
  const [operation] = await dataplexClient.updateDataProduct(request);
  const [response] = await operation.promise();
  console.log(response);
}

callUpdateDataProduct();

Python

Python

Before trying this sample, follow the Python setup instructions in the Knowledge Catalog quickstart using client libraries. For more information, see the Knowledge Catalog Python API reference documentation.

To authenticate to Knowledge Catalog, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

# This snippet has been automatically generated and should be regarded as a
# code template only.
# It will require modifications to work:
# - It may require correct/in-range values for request initialization.
# - It may require specifying regional endpoints when creating the service
#   client as shown in:
#   https://googleapis.dev/python/google-api-core/latest/client_options.html
from google.cloud import dataplex_v1


def sample_update_data_product():
    # Create a client
    client = dataplex_v1.DataProductServiceClient()

    # Initialize request argument(s)
    data_product = dataplex_v1.DataProduct()
    data_product.display_name = "display_name_value"
    data_product.owner_emails = ["owner_emails_value1", "owner_emails_value2"]

    request = dataplex_v1.UpdateDataProductRequest(
        data_product=data_product,
    )

    # Make the request
    operation = client.update_data_product(request=request)

    print("Waiting for operation to complete...")

    response = operation.result()

    # Handle the response
    print(response)

REST

To configure an access group for the data product, use the dataProducts.patch method.

For example, send the following PATCH request:

curl -X PATCH \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-d '{"access_groups": ACCESS_GROUPS_MAP}' \
https://dataplex.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/dataProducts/DATA_PRODUCT_ID?update_mask="access_groups"

Replace the following:

  • ACCESS_GROUPS_MAP: a JSON object representing a map where each key is an access group ID and the value is an AccessGroup object. For example:

    {
    "analyst": {
      "id": "analyst",
      "display_name": "Analyst access group",
      "description": "Access group for analysts",
      "principal":
        {
          "google_group": "analyst-team@example.com",
          "service_account": "analyst-svc@gserviceaccount.com"
        }
    }
    
  • PROJECT_ID: the ID of your Google Cloud project

  • LOCATION: the region where the data product exists

  • DATA_PRODUCT_ID: the ID of your data product

Configure asset permissions

After you configure access groups, you can configure permissions for the assets in the data product.

Console

  1. In the Asset permissions section, select the asset for which you want to configure permissions. You can select and configure permissions for up to 10 assets at a time.

  2. Click Configure permissions.

  3. In the Select access group field, select an access group.

  4. In the Assign IAM role field, select an IAM role that you want to assign to the access group.

    For example, if your asset is a BigQuery table named Sales, and if you selected Analyst access group, and assigned BigQuery Metadata Viewer role to this access group, the data product consumers who are part of the Analyst access group have BigQuery Metadata Viewer permission on the Sales table.

    You can add multiple roles to an asset.

  5. Click Configure. The asset now shows its assigned permissions.

  6. To configure permissions for other assets, repeat the steps.

  7. Click Continue.

Terraform

Assign IAM roles to your access groups for specific assets using the access_group_configs block in the google_dataplex_data_product_data_asset resource.

For example, use the following configuration:

resource "google_dataplex_data_product_data_asset" "example_data_asset" {
project         = "PROJECT_ID"
location        = "LOCATION"
data_product_id = "DATA_PRODUCT_ID"
data_asset_id   = "DATA_ASSET_ID"
resource        = "RESOURCE_NAME"

access_group_configs {
  access_group = "analyst" # Must match the 'id' defined in google_dataplex_data_product
  iam_roles    = ["roles/bigquery.dataViewer"]
}

provider = google-beta
}

Replace the following:

  • PROJECT_ID: the ID of your Google Cloud project
  • LOCATION: the region where the data product exists
  • DATA_PRODUCT_ID: the ID of the data product
  • DATA_ASSET_ID: a unique ID for this data asset within the data product
  • RESOURCE_NAME: the full resource name of the data asset (for example, //bigquery.googleapis.com/projects/PROJECT_ID/datasets/DATASET_ID/tables/TABLE_ID)

C#

C#

Before trying this sample, follow the C# setup instructions in the Knowledge Catalog quickstart using client libraries. For more information, see the Knowledge Catalog C# API reference documentation.

To authenticate to Knowledge Catalog, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

using Google.Cloud.Dataplex.V1;
using Google.LongRunning;
using Google.Protobuf.WellKnownTypes;

public sealed partial class GeneratedDataProductServiceClientSnippets
{
    /// <summary>Snippet for UpdateDataAsset</summary>
    /// <remarks>
    /// This snippet has been automatically generated and should be regarded as a code template only.
    /// It will require modifications to work:
    /// - It may require correct/in-range values for request initialization.
    /// - It may require specifying regional endpoints when creating the service client as shown in
    ///   https://cloud.google.com/dotnet/docs/reference/help/client-configuration#endpoint.
    /// </remarks>
    public void UpdateDataAssetRequestObject()
    {
        // Create client
        DataProductServiceClient dataProductServiceClient = DataProductServiceClient.Create();
        // Initialize request argument(s)
        UpdateDataAssetRequest request = new UpdateDataAssetRequest
        {
            DataAsset = new DataAsset(),
            UpdateMask = new FieldMask(),
            ValidateOnly = false,
        };
        // Make the request
        Operation<DataAsset, OperationMetadata> response = dataProductServiceClient.UpdateDataAsset(request);

        // Poll until the returned long-running operation is complete
        Operation<DataAsset, OperationMetadata> completedResponse = response.PollUntilCompleted();
        // Retrieve the operation result
        DataAsset result = completedResponse.Result;

        // Or get the name of the operation
        string operationName = response.Name;
        // This name can be stored, then the long-running operation retrieved later by name
        Operation<DataAsset, OperationMetadata> retrievedResponse = dataProductServiceClient.PollOnceUpdateDataAsset(operationName);
        // Check if the retrieved long-running operation has completed
        if (retrievedResponse.IsCompleted)
        {
            // If it has completed, then access the result
            DataAsset retrievedResult = retrievedResponse.Result;
        }
    }
}

Go

Go

Before trying this sample, follow the Go setup instructions in the Knowledge Catalog quickstart using client libraries. For more information, see the Knowledge Catalog Go API reference documentation.

To authenticate to Knowledge Catalog, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.


//go:build examples

package main

import (
	"context"

	dataplex "cloud.google.com/go/dataplex/apiv1"
	dataplexpb "cloud.google.com/go/dataplex/apiv1/dataplexpb"
)

func main() {
	ctx := context.Background()
	// This snippet has been automatically generated and should be regarded as a code template only.
	// It will require modifications to work:
	// - It may require correct/in-range values for request initialization.
	// - It may require specifying regional endpoints when creating the service client as shown in:
	//   https://pkg.go.dev/cloud.google.com/go#hdr-Client_Options
	c, err := dataplex.NewDataProductClient(ctx)
	if err != nil {
		// TODO: Handle error.
	}
	defer c.Close()

	req := &dataplexpb.UpdateDataAssetRequest{
		// TODO: Fill request struct fields.
		// See https://pkg.go.dev/cloud.google.com/go/dataplex/apiv1/dataplexpb#UpdateDataAssetRequest.
	}
	op, err := c.UpdateDataAsset(ctx, req)
	if err != nil {
		// TODO: Handle error.
	}

	resp, err := op.Wait(ctx)
	if err != nil {
		// TODO: Handle error.
	}
	// TODO: Use resp.
	_ = resp
}

Java

Java

Before trying this sample, follow the Java setup instructions in the Knowledge Catalog quickstart using client libraries. For more information, see the Knowledge Catalog Java API reference documentation.

To authenticate to Knowledge Catalog, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

import com.google.cloud.dataplex.v1.DataAsset;
import com.google.cloud.dataplex.v1.DataProductServiceClient;
import com.google.cloud.dataplex.v1.UpdateDataAssetRequest;
import com.google.protobuf.FieldMask;

public class SyncUpdateDataAsset {

  public static void main(String[] args) throws Exception {
    syncUpdateDataAsset();
  }

  public static void syncUpdateDataAsset() throws Exception {
    // This snippet has been automatically generated and should be regarded as a code template only.
    // It will require modifications to work:
    // - It may require correct/in-range values for request initialization.
    // - It may require specifying regional endpoints when creating the service client as shown in
    // https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
    try (DataProductServiceClient dataProductServiceClient = DataProductServiceClient.create()) {
      UpdateDataAssetRequest request =
          UpdateDataAssetRequest.newBuilder()
              .setDataAsset(DataAsset.newBuilder().build())
              .setUpdateMask(FieldMask.newBuilder().build())
              .setValidateOnly(true)
              .build();
      DataAsset response = dataProductServiceClient.updateDataAssetAsync(request).get();
    }
  }
}

Node.js

Node.js

Before trying this sample, follow the Node.js setup instructions in the Knowledge Catalog quickstart using client libraries. For more information, see the Knowledge Catalog Node.js API reference documentation.

To authenticate to Knowledge Catalog, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

/**
 * This snippet has been automatically generated and should be regarded as a code template only.
 * It will require modifications to work.
 * It may require correct/in-range values for request initialization.
 * TODO(developer): Uncomment these variables before running the sample.
 */
/**
 *  Required. The data asset to update.
 *  The data asset's `name` field is used to identify the data asset to update.
 */
// const dataAsset = {}
/**
 *  Optional. The list of fields to update.
 *  If this is empty or not set, then all the fields will be updated.
 */
// const updateMask = {}
/**
 *  Optional. Validates the request without actually updating the data asset.
 *  Defaults to false.
 */
// const validateOnly = true

// Imports the Dataplex library
const {DataProductServiceClient} = require('@google-cloud/dataplex').v1;

// Instantiates a client
const dataplexClient = new DataProductServiceClient();

async function callUpdateDataAsset() {
  // Construct request
  const request = {
    dataAsset,
  };

  // Run request
  const [operation] = await dataplexClient.updateDataAsset(request);
  const [response] = await operation.promise();
  console.log(response);
}

callUpdateDataAsset();

Python

Python

Before trying this sample, follow the Python setup instructions in the Knowledge Catalog quickstart using client libraries. For more information, see the Knowledge Catalog Python API reference documentation.

To authenticate to Knowledge Catalog, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

# This snippet has been automatically generated and should be regarded as a
# code template only.
# It will require modifications to work:
# - It may require correct/in-range values for request initialization.
# - It may require specifying regional endpoints when creating the service
#   client as shown in:
#   https://googleapis.dev/python/google-api-core/latest/client_options.html
from google.cloud import dataplex_v1


def sample_update_data_asset():
    # Create a client
    client = dataplex_v1.DataProductServiceClient()

    # Initialize request argument(s)
    data_asset = dataplex_v1.DataAsset()
    data_asset.resource = "resource_value"

    request = dataplex_v1.UpdateDataAssetRequest(
        data_asset=data_asset,
    )

    # Make the request
    operation = client.update_data_asset(request=request)

    print("Waiting for operation to complete...")

    response = operation.result()

    # Handle the response
    print(response)

REST

To configure permissions for the assets in the data product, use the dataAssets.patch method.

For example, send the following PATCH request:

curl -X PATCH \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-d '{"access_group_configs": ACCESS_GROUP_CONFIGS_MAP}' \
https://dataplex.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/dataProducts/DATA_PRODUCT_ID/dataAssets/DATA_ASSET_ID?update_mask="access_group_configs"

Replace the following:

  • ACCESS_GROUP_CONFIGS_MAP: a JSON object representing a map where each key is an access group ID and the value is an AccessGroupConfig object. For example:

    {
    "analyst": {
      iam_roles: ["roles/bigquery.dataViewer"]
      }
    }
    
  • PROJECT_ID: the ID of your Google Cloud project

  • LOCATION: the region where the data product exists

  • DATA_PRODUCT_ID: the ID of your data product

  • DATA_ASSET_ID: the ID of the asset for which you want to configure permissions

Optional: Add contract and aspect details

You can add contracts and aspects for a data product.

Add a contract

To establish a foundation of trust between data producers and consumers, you can attach a contract to your data product. By specifying parameters such as refresh time and thresholds, you provide consumers with the necessary context to understand when the data is updated and whether it meets their specific business requirements.

Console

  1. In the Add contract and aspect details pane, click Add contract.

  2. In the Select contract field, select Refresh cadence.

  3. In the Frequency field, select an agreed-upon schedule for how often data is updated or delivered, ensuring a predictable flow from data producer to data consumer. For example, Weekly.

  4. In the Refresh time field, enter a maximum acceptable time when data is updated at its source and when it becomes available to the consumer. For example, 23:00 PST.

  5. In the Threshold (in minutes) field, enter a measurable limit in minutes for the acceptable delay in data delivery. For example, enter 30 to set a threshold of 30 minutes.

  6. Optional: In the Cron schedule field, enter a cron expression that defines the schedule for data generation and delivery in the format: MINUTE HOUR DAY_OF_MONTH MONTH DAY_OF_WEEK

    The following are the accepted values:

    • MINUTE: 0-59
    • HOUR: 0-23
    • DAY_OF_MONTH: 1-31
    • MONTH: 1-31 or JAN-DEC
    • DAY_OF_WEEK: 0-6 or SUN-SAT

    For example, 0 8 * * 1-5 runs at 8:00 AM on weekdays (Monday-Friday).

  7. Click Save.

REST

Contracts are modeled as aspects on the data product. To add a Refresh Cadence contract for a data product, use the entries.patch method.

For example, send the following PATCH request:

curl -X PATCH \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-d \
'{
  "aspects": {
    "dataplex-types.global.refresh-cadence": {
      "aspectType": "projects/dataplex-types/locations/global/aspectTypes/refresh-cadence",
      "data": {
        "frequency": "REFRESH_FREQUENCY"
      }
    }
  }
}' \
"https://dataplex.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/entryGroups/@dataplex/entries/projects/DATA_PRODUCT_PROJECT_NUMBER/locations/DATA_PRODUCT_LOCATION/dataProducts/DATA_PRODUCT_ID?updateMask=aspects"

Replace the following:

  • REFRESH_FREQUENCY: the agreed-upon schedule for how often data is updated or delivered, ensuring a predictable flow from data producer to data consumer. For example: Weekly
  • PROJECT_ID: the ID of your Google Cloud project where the API call is being made
  • LOCATION: the region of the Knowledge Catalog service endpoint you are calling (for example, us-central1)
  • DATA_PRODUCT_PROJECT_NUMBER: the project number where the data product resource is located
  • DATA_PRODUCT_LOCATION: the location of the data product resource
  • DATA_PRODUCT_ID: the ID of your data product

Terraform

Contracts are modeled as aspects on the data product. To manage a contract, you must manage the underlying Knowledge Catalog entry. Because Terraform doesn't automatically discover existing aspects, you must first import the google_dataplex_entry.

To import the entry, use the following command:

terraform import google_dataplex_entry.data_product_metadata "projects/DATA_PRODUCT_PROJECT_NUMBER/locations/LOCATION/entryGroups/@dataplex/entries/projects/DATA_PRODUCT_PROJECT_NUMBER/locations/LOCATION/dataProducts/DATA_PRODUCT_ID"

Terraform configuration:

resource "google_dataplex_entry" "data_product_metadata" {
project        = "DATA_PRODUCT_PROJECT_NUMBER"
location       = "LOCATION"
entry_group_id = "@dataplex"
entry_id       = "projects/DATA_PRODUCT_PROJECT_NUMBER/locations/LOCATION/dataProducts/DATA_PRODUCT_ID"
entry_type     = "projects/655216118709/locations/global/entryTypes/data-product"

aspects {
  aspect_key = "655216118709.global.refresh-cadence"
  aspect {
    data = jsonencode({
      frequency = "REFRESH_FREQUENCY"
    })
  }
}

provider = google-beta
}

Replace the following:

  • DATA_PRODUCT_PROJECT_NUMBER: the project number where the data product resource is located
  • LOCATION: the region of the Knowledge Catalog service endpoint you are calling (for example, us-central1)
  • DATA_PRODUCT_ID: the ID of your data product
  • REFRESH_FREQUENCY: the agreed-upon schedule for how often data is updated or delivered, ensuring a predictable flow from data producer to data consumer. For example: Weekly

For general information on the import process, refer to the Terraform import documentation.

Add aspects

Use aspects to enrich your data product with structured, reusable metadata. These templates provide a standardized way for data producers to communicate the quality and fitness of a data product, improving governance and helping consumers determine if the product meets their business needs.

To add aspects for the data product, follow these steps:

Console

  1. In the Add contract and aspect details pane, click + Add aspect.

  2. In the Select aspect type field, search for and select an aspect type from the list. For example, Geo context.

  3. Click Save.

REST

To add aspects for a data product, use the entries.patch method.

For example, send the following PATCH request:

curl -X PATCH \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-d \
'{
  "aspects": {
    "ASPECT_PROJECT_ID.ASPECT_LOCATION.ASPECT_NAME": {
      "aspectType": "projects/ASPECT_PROJECT_ID/locations/ASPECT_LOCATION/aspectTypes/ASPECT_NAME",
      "data": {}
    }
  }
}' \
"https://dataplex.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/entryGroups/@dataplex/entries/projects/DATA_PRODUCT_PROJECT_NUMBER/locations/DATA_PRODUCT_LOCATION/dataProducts/DATA_PRODUCT_ID?updateMask=aspects"

Replace the following:

  • ASPECT_PROJECT_ID: the ID of your Google Cloud project where the aspect is created
  • ASPECT_LOCATION: the region of the Knowledge Catalog service endpoint where the aspect is created (for example, us-central1)
  • ASPECT_NAME: the name of the aspect you want to attach to the data product
  • PROJECT_ID: the ID of your Google Cloud project where the API call is being made
  • LOCATION: the region of the Knowledge Catalog service endpoint you are calling (for example, us-central1)
  • DATA_PRODUCT_PROJECT_NUMBER: the project number where the data product resource is located
  • DATA_PRODUCT_LOCATION: the location of the data product resource
  • DATA_PRODUCT_ID: the ID of your data product

Terraform

To manage aspects, you must manage the underlying Knowledge Catalog entry. Because Terraform doesn't automatically discover existing aspects, you must first import the google_dataplex_entry.

To import the entry, use the following command:

terraform import google_dataplex_entry.data_product_metadata "projects/DATA_PRODUCT_PROJECT_NUMBER/locations/LOCATION/entryGroups/@dataplex/entries/projects/DATA_PRODUCT_PROJECT_NUMBER/locations/LOCATION/dataProducts/DATA_PRODUCT_ID"

Terraform configuration:

resource "google_dataplex_entry" "data_product_metadata" {
project        = "DATA_PRODUCT_PROJECT_NUMBER"
location       = "LOCATION"
entry_group_id = "@dataplex"
entry_id       = "projects/DATA_PRODUCT_PROJECT_NUMBER/locations/LOCATION/dataProducts/DATA_PRODUCT_ID"
entry_type     = "projects/655216118709/locations/global/entryTypes/data-product"

aspects {
  aspect_key = "ASPECT_PROJECT_NUMBER.ASPECT_LOCATION.ASPECT_NAME"
  aspect {
    data = {}
  }
}

provider = google-beta
}

Replace the following:

  • DATA_PRODUCT_PROJECT_NUMBER: the project number where the data product resource is located
  • LOCATION: the region of the Knowledge Catalog service endpoint you are calling (for example, us-central1)
  • DATA_PRODUCT_ID: the ID of your data product
  • ASPECT_PROJECT_NUMBER: the Google Cloud project number where the aspect is created
  • ASPECT_LOCATION: the region of the Knowledge Catalog service endpoint where the aspect is created (for example, us-central1)
  • ASPECT_NAME: the name of the aspect you want to attach to the data product

For general information on the import process, refer to the Terraform import documentation.

Optional: Add additional details

You can add documentation and sample queries for your data product to provide essential context, business logic descriptions, and user guides. In Knowledge Catalog, documentation is managed through the overview system aspect.

You can manually create this documentation or use Knowledge Catalog data insights to automatically generate it.

Manually add documentation and sample queries

Console

To add documentation for your data product, follow these steps:

  1. In the Add additional details pane, click Edit next to Documentation.

  2. Type in the content in the rich-text editor.

  3. Click Save.

To add sample queries for your data product, follow these steps:

  1. In the Add additional details pane, click Add queries in the Query recommendation section.

  2. Type the sample queries.

  3. Click Save.

The newly created data product appears on the Knowledge Catalog Data products page.

REST

Documentation is modeled as aspects on the data product. To add documentation, use the entries.patch method.

For example, send the following PATCH request:

curl -X PATCH \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-d \
'{
  "aspects": {
    "dataplex-types.global.overview": {
      "aspectType": "projects/dataplex-types/locations/global/aspectTypes/overview",
      "data": {
        "content": "DOCUMENTATION"
      }
    }
  }
}' \
"https://dataplex.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/entryGroups/@dataplex/entries/projects/DATA_PRODUCT_PROJECT_NUMBER/locations/DATA_PRODUCT_LOCATION/dataProducts/DATA_PRODUCT_ID?updateMask=aspects"

Replace the following:

  • PROJECT_ID: the ID of your Google Cloud project where the API call is being made
  • LOCATION: the region of the Knowledge Catalog service endpoint you are calling (for example, us-central1)
  • DATA_PRODUCT_PROJECT_NUMBER: the project number where the data product resource is located
  • DATA_PRODUCT_LOCATION: the location of the data product resource
  • DATA_PRODUCT_ID: the ID of your data product
  • DOCUMENTATION: the content that you want to attach to the data product

Terraform

Documentation is modeled as aspects on the data product. To manage documentation, you must manage the underlying Knowledge Catalog entry. Because Terraform doesn't automatically discover existing aspects, you must first import the google_dataplex_entry.

To import the entry, use the following command:

terraform import google_dataplex_entry.data_product_metadata "projects/DATA_PRODUCT_PROJECT_NUMBER/locations/LOCATION/entryGroups/@dataplex/entries/projects/DATA_PRODUCT_PROJECT_NUMBER/locations/LOCATION/dataProducts/DATA_PRODUCT_ID"

Terraform configuration:

resource "google_dataplex_entry" "data_product_metadata" {
project        = "DATA_PRODUCT_PROJECT_NUMBER"
location       = "LOCATION"
entry_group_id = "@dataplex"
entry_id       = "projects/DATA_PRODUCT_PROJECT_NUMBER/locations/LOCATION/dataProducts/DATA_PRODUCT_ID"
entry_type     = "projects/655216118709/locations/global/entryTypes/data-product"

aspects {
  aspect_key = "655216118709.global.overview"
  aspect {
    data = jsonencode({
      content = "DOCUMENTATION"
    })
  }
}

provider = google-beta
}

Replace the following:

  • DATA_PRODUCT_PROJECT_NUMBER: the project number where the data product resource is located
  • LOCATION: the region of the Knowledge Catalog service endpoint you are calling (for example, us-central1)
  • DATA_PRODUCT_ID: the ID of your data product
  • DOCUMENTATION: the content that you want to attach to the data product

For general information on the import process, refer to the Terraform import documentation.

Generate automated documentation and sample queries using data insights

Before you generate documentation and sample queries using Gemini, complete the following prerequisites:

  1. Enable the Gemini for Google Cloud API in the project where you create the data product.

  2. Grant insight-specific user roles: Ask your administrator to grant your identity the following roles and permissions on the data product project:

    • Generate and manage data insights: Dataplex DataScan Editor (roles/dataplex.dataScanEditor) or Dataplex DataScan Administrator (roles/dataplex.dataScanAdmin) on the project where data product resides
    • View generated insights: Dataplex DataScan DataViewer (roles/dataplex.dataScanDataViewer) on the project where data product resides
  3. Configure cross-project service agent permissions. If your underlying data assets reside in a Google Cloud project different from your data product project, you must grant the Knowledge Catalog service agent (P4SA) access to those assets:

    1. To generate or retrieve the service agent identifier for your data product project, run the following Google Cloud CLI command:

      gcloud beta services identity create --service=dataplex.googleapis.com --project=DATA_PRODUCT_PROJECT_ID
      

      Replace DATA_PRODUCT_PROJECT_ID with the Google Cloud project ID where your data product resides.

    2. In each external project where your assets reside, grant the data product project's service agent the following roles:

      • BigQuery Data Editor (roles/bigquery.dataEditor) on the underlying tables and datasets

      • BigQuery Studio Admin (roles/bigquery.studioAdmin) on the asset project

To generate documentation and sample queries for your data product using data insights, follow these steps:

  1. In the Add additional details pane, on the Generate insights with Gemini bar, and click Generate.

    Wait for a few minutes for the insight generation process to complete.

  2. To review the generated content, click View.

  3. Evaluate the generated content:

    • If the content is accurate, click Save. This populates the rich-text editor with a predefined documentation template and adds sample queries to the Insights section.

    • If the content doesn't meet expectations, click Discard.

  4. Click Save to finalize.

What's next