自 2026 年 4 月 10 日起，Dataplex Universal Catalog 將更名為 Knowledge Catalog。API、用戶端程式庫、CLI 和 IAM 名稱維持不變。詳情請參閱「隆重推出 Google Cloud Knowledge Catalog」。

Google uses AI technology to translate content into your preferred language. AI translations can contain errors.

重複使用資料品質規則

本文說明如何重複使用 Knowledge Catalog (舊稱 Dataplex Universal Catalog) 資料品質規則，定義及管理標準化業務規則。

規則重複使用功能可讓您使用規則範本，在多個資料品質規則和掃描作業中，共用複雜或標準化的業務規則定義。本文也會說明如何設定、建立及管理可重複使用的規則範本，以及如何將資料品質規則附加至目錄項目做為中繼資料切面。

用途

資料品質規則重複使用功能適用於下列情境：

標準化及分享規則定義：使用自訂規則範本儲存複雜或標準化的業務規則定義。使用範本化 SQL 運算式，可減少分配常見定義所需的時間和精力。舉例來說，中央資料治理團隊可以定義標準的「有效電子郵件」或「有效社會安全號碼 (SSN)」範本，供整個機構重複使用，確保一致性並減少管理重複規則的作業負擔。
導入治理導向的品質：在 BigQuery 資料表和組織詞彙項目上使用 Knowledge Catalog 構面，將資料規則宣告為中繼資料。方便搜尋及重複使用規則。舉例來說，將資料欄連結至術語表字詞後，系統會自動沿用為該字詞定義的驗證規則，透過語意中繼資料沿用功能，自動套用管理政策。
搜尋及探索可重複使用的規則：透過語意搜尋，尋找貴機構現有的規則。資料分析師和工程師可以藉此探索經過驗證的標準化規則集 (例如「基準財務常數」)，並為新專案啟動資料品質，不必從頭編寫 SQL。
解決冷啟動問題：針對經常使用的評估項目 (例如空值檢查或範圍期望)，運用系統規則範本。您可以使用這些內建範本，快速為常見情境設定資料品質監控，不必編寫自訂 SQL。
區分關注事項：中央控管團隊可以編寫經過驗證的規則範本，工程團隊則專注於將這些規則套用至資料資產，不必編寫或維護複雜的 SQL。明確劃分職責可提升組織的靈活度，並確保企業內一律採用資料品質標準。

事前準備

啟用 Dataplex API。
啟用 API 時所需的角色
您必須具備 serviceusage.services.enable 權限，才能啟用 API。如果您建立了專案，可能已透過「擁有者」角色 (roles/owner) 取得這項權限。否則，您可以透過「服務使用情形管理員」角色 (roles/serviceusage.serviceUsageAdmin) 取得這項權限。瞭解如何授予角色。
啟用 API

使用資料品質規則重複使用功能前，請先確認您已完成下列必要條件。

設定 Dataplex API 環境

如要使用本文中的 REST API 範例，請設定 gcurl 的別名，並設定 ${DATAPLEX_API} 環境變數。

設定 gcurl 的別名。這會建立包含驗證權杖的捷徑，並為 API 要求設定 JSON 內容類型：

alias gcurl='curl -H "Authorization: Bearer $(gcloud auth print-access-token)" -H "Content-Type: application/json"'

設定 DATAPLEX_API 變數：
```
DATAPLEX_API="dataplex.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION"
```
更改下列內容：
- PROJECT_ID：您的專案 ID。
- LOCATION：掃描或資源所在的位置 (例如 us-central1)。

設定服務帳戶

如要使用可重複使用的規則執行資料品質掃描，就必須使用服務帳戶。建立服務帳戶，並授予下列 Identity and Access Management 角色和權限：

您必須具備服務帳戶所在專案的 iam.serviceAccounts.actAs 權限 (通常是透過 roles/iam.serviceAccountUser 角色)。
將服務帳戶的 iam.serviceAccounts.getAccessToken 權限授予掃描專案的 Dataplex 服務代理程式 (service-PROJECT_ID@gcp-sa-dataplex.iam.gserviceaccount.com)，例如使用 roles/iam.serviceAccountTokenCreator 角色。
服務帳戶必須具備下列權限：
- bigquery.tables.getData，例如使用 roles/bigquery.dataViewer。
- 掃描專案中的 bigquery.jobs.insert (例如使用 roles/bigquery.jobUser)。
- 匯出資料集上的 roles/bigquery.dataEditor (如要匯出)。

必要角色和權限

請確認您具備下列 IAM 角色，可執行特定工作：

資料掃描管理：管理資料掃描資源所需的資料掃描角色。
規則範本管理：如要建立或更新規則範本，您必須具備管理規則範本項目群組或專案中項目的必要權限。具體來說，roles/dataplex.catalogEditor 或 roles/dataplex.entryOwner 會授予這些權限。
從規則參照規則範本：您必須對規則參照的規則範本項目群組或專案，擁有 dataplex.entries.get 和 dataplex.entries.getData 權限。
將資料品質規則附加至 BigQuery 資料表：如要將資料品質規則附加為 Knowledge Catalog 中繼資料，您必須具備下列其中一項：
- bigquery.tables.update 或 roles/bigquery.dataEditor，然後在表格位置的 @bigquery 項目群組上按一下 dataplex.entryGroups.useDataRulesAspect。
- roles/dataplex.catalogEditor 項目群組的 @bigquery 權限。
將資料品質規則附加至組織詞彙字詞：如要將資料品質規則附加為 Knowledge Catalog 中繼資料，您必須具備下列其中一項：
- dataplex.glossaryTerms.update，以及dataplex.entryGroups.useDataRulesAspect對@dataplex項目群組的存取權。
- roles/dataplex.catalogEditor 項目群組的 @dataplex 權限。
使用項目規則建立資料品質掃描作業：您必須具備下列其中一項權限：
- 桌上擺放bigquery.tables.get和bigquery.tables.getData。
- dataplex.entries.get 和 dataplex.entries.getData 位於表格位置的項目群組中。@bigquery

規則範本的 SQL 查詢語法

為規則範本編寫 SQL 邏輯時，您必須提供會傳回無效資料列的陳述式。如果查詢傳回任何資料列，規則就會失敗。詳情請參閱「SqlAssertion」。

撰寫規則範本 SQL 時，請遵守下列原則：

省略 SQL 陳述式結尾的分號。
使用 ${param(name)} 參照輸入參數，例如 ${param(min_value)}。
使用$${...} to escape a literal ${...} and prevent it from being replaced as a parameter.
Parameter variables are case-sensitive.

System-supported parameters

You can use the following system-supported parameters in your rule template SQL:

${project()}: The project ID of the resource being scanned.
${dataset()}: The BigQuery dataset ID of the resource being scanned, formatted as PROJECT_ID.DATASET_ID.
${table()}: The BigQuery table ID of the resource being scanned, formatted as PROJECT_ID.DATASET_ID.TABLE_ID.
${column()}: The column the rule is evaluated on. An error occurs during rule evaluation if the rule is attached to the table level but references ${column()}.
${data()}: A reference to the data source table and all of its precondition filters like row filters, sampling percentages, and incremental filters defined in the scan specification. For more information, see Data reference parameter.

Example 1: Validate column values are between two values

The following example validates that all values in a column are between a minimum and maximum value:

SELECT *
FROM ${data()}
WHERE
  NOT ((${column()}>=${param(min_value)} AND ${column()}<=${param(max_value)}) IS TRUE)

Note the following:

Using NOT(condition) IS TRUE returns invalid rows, including rows with NULL values in the column.
Using ${data()} limits the scope of rows evaluated to the source table and its filters, such as row filters, sampling percentages, and incremental filters.
Using ${column()} lets you reference the column that the rule using this template is evaluated on.

Example 2: Foreign key validation

The following example verifies that each value in a column exists in a primary key column of another table:

SELECT t.*
FROM ${data()} AS t
LEFT JOIN `${param(reference_table)}` AS s
  ON t.${column()} = s.`${param(reference_column)}`
WHERE s.`${param(reference_column)}` IS NULL

Input parameters for this template are as follows:

reference_table: The name of the reference table containing the primary keys. Use the format PROJECT_ID.DATASET_ID.TABLE_ID.
reference_column: The name of the primary key column in the reference table.

System rule templates

Knowledge Catalog provides system rule templates that can be used in any region. Knowledge Catalog manages these templates in the dataplex-templates project under the rule-library entry group. An example of a full resource name is projects/dataplex-templates/locations/global/entryGroups/rule-library/entries/non_null_expectation.

To view the list of all the available system rule templates, see System rule templates list.

To find the available list of system rule templates, select one of the following options:

Console

In the Google Cloud console, go to the Data profiling & quality page.

Go to Data profiling & quality
Click Rule libraries > System.
To see the list of available system rule templates, click rule-library.

When creating a new rule, you can select the system rule templates in the Choose rule types menu.

REST

To find the available list of system rule templates, use the entries.list method:

gcurl "https://dataplex.googleapis.com/v1/projects/dataplex-templates/locations/global/entryGroups/rule-library/entries"

Known differences between system rule templates and built-in rules

The following table describes the differences between system rule templates and built-in rules:

Feature	System rule templates	Built-in rules
Source	Reusable templates in the catalog	Built-in in the API
Referencing	Can be referenced by catalog entries and scans	Can only be used in scans

The following list describes additional differences in how metrics are calculated for system rule templates:

Assertion Row Count metric: This metric is populated for all template reference rules, not just SQL assertion rules.
Statistic Range Expectation rule template: Rule metrics from evaluation of rules referencing this template wouldn't contain the nullCount metric. Because it is an aggregate rule, the ignore null capability isn't supported, and rule success is determined by the aggregate statistic being within the defined range.
Uniqueness Expectation rule template: This template calculates passedCount differently than the built-in UniquenessExpectation rule. The rule template returns all rows for which duplicate values or null rows exist, which can result in fewer passing rows if duplicates are present.

For example, if a column contains the values (a, a, b, b, c, d, e):
- Built-in uniqueness rule: Returns 5 passing rows: (a, b, c, d, e).
- Uniqueness rule template: Returns 4 failing rows: (a, a, b, b). The number of passing rows is 3 (7 total rows minus 4 failed rows): (c, d, e).

Metadata aspects

This section describes the fields and values for the data-rules and data-quality-rule-template aspect types.

`data-rules` aspect fields

To define data rules, use the dataplex-types.global.data-rules aspect. The following table describes the fields for this aspect.

Field	Type	Description
`rules`	Array	Required. A list of data quality rules.
`rules[].name`	String	Required. A name for the rule.
`rules[].dimension`	String	Optional. The data quality dimension for the rule.
`rules[].description`	String	Optional. The description of the rule.
`rules[].suspended`	Boolean	Optional. Whether the rule is active or suspended. Default is `false`.
`rules[].threshold`	Double	Optional. The passing threshold for the rule, from `0.0` to `1.0`. Default is `1.0`.
`rules[].type`	Enum	Required. The type of the rule. The only supported value is `TEMPLATE_REFERENCE`.
`rules[].ignore_null`	Boolean	Optional. If `true`, rows with null values in the column are ignored when determining the success criteria.
`rules[].attributes`	Map	Optional. Custom key-value pairs associated with the rule.
`rules[].templateReference`	Object	Required. A reference to the rule template.
`rules[].templateReference.name`	String	Required. The resource name of the rule template.
`rules[].templateReference.values`	Map	Optional. The parameter names and values for the rule template.
`rules[].templateReference.values[].parameterValue.value`	String	Required. The value for the parameter.

The following example shows a data-rules aspect in a payload.json file:

{
  "aspects": {
    "dataplex-types.global.data-rules": {
      "data": {
        "rules": [
          {
            "name": "valid-email",
            "dimension": "VALIDITY",
            "type": "TEMPLATE_REFERENCE",
            "templateReference": {
              "name": "projects/my-project/locations/us-central1/entryGroups/my-rules/entries/email-check",
              "values": {
                "column_name": {
                    "value": "email"
                }
              }
            }
          }
        ]
      }
    }
  }
}

`data-quality-rule-template` aspect fields

Use the data-quality-rule-template aspect to define a custom data quality rule template. The following table describes the fields for the dataplex-types.global.data-quality-rule-template aspect.

Field	Type	Description
`dimension`	String	Required. The dimension for the rule template.
`sqlCollection`	Array	Required. A list of SQL queries for the rule template.
`sqlCollection[].sql.query`	String	Required. The SQL query that returns invalid rows.
`inputParameters`	Map	Optional. A map of input parameters for the rule template.
`inputParameters[].parameterDescription.description`	String	Optional. The description of the input parameter.
`inputParameters[].parameterDescription.defaultValue`	String	Optional. The default value for the parameter if no value is provided.
`capabilities`	Array	Optional. A list of template capabilities, such as `THRESHOLD` or `IGNORE_NULL`.

The following example displays the structure of a data-quality-rule-template aspect:

{
  "entryType": "projects/dataplex-types/locations/global/entryTypes/data-quality-rule-template",
  "aspects": {
    "dataplex-types.global.data-quality-rule-template": {
      "data": {
        "dimension": "COMPLETENESS",
        "sqlCollection": [
          {
              "query": "SELECT * FROM ${data()} WHERE ${column()} > ${param(p1)}"
          }
        ],
        "inputParameters": {
          "p1": {
              "description": "The parameter description"
          }
        },
        "capabilities": [
          "THRESHOLD",
          "IGNORE_NULL"
        ]
      }
    }
  }
}

Manage data quality rule templates

This section describes how to create, edit, and delete rule templates.

Create a rule library

To create a rule library, you must create a Knowledge Catalog entry group.

Console

In the Google Cloud console, go to the Data profiling & quality page.

Go to Data profiling & quality
Go to Rule libraries > Custom, and click Create.
In the Create rule library window, fill in the following fields:
1. Optional: Enter a display name.
2. In Rule library ID, enter an ID. For more information, see the resource naming conventions.
3. Optional: Enter a description.
4. In the Location menu, select a location. It can't be changed later.
5. Optional: Add labels. Labels are key-value pairs that let you group related objects together or with other Google Cloud resources.
6. Click Save.

REST

To create a rule library by using the API, you must create an entry group with the required label goog-dataplex-entry-group-type: rule_library:

gcurl -X POST "https://${DATAPLEX_API}/entryGroups?entryGroup_id=RULE_LIBRARY_ID" \
--data @- << EOF
{
"labels": {
  "goog-dataplex-entry-group-type": "rule_library"
},
"description": "DESCRIPTION"
}
EOF

Replace the following:

RULE_LIBRARY_ID: a unique ID for your rule library.
DESCRIPTION: an optional description for the rule library.

Terraform

To create a rule library, use the google_dataplex_entry_group resource:

resource "google_dataplex_entry_group" "rule_library" {
project        = "PROJECT_ID"
location       = "LOCATION"
entry_group_id = "RULE_LIBRARY_ID"
description    = "DESCRIPTION"

labels = {
"goog-dataplex-entry-group-type" = "rule_library"
}
}

Replace the following:

PROJECT_ID: your project ID.
LOCATION: the location for your rule library (for example, us-central1).
RULE_LIBRARY_ID: a unique ID for your rule library.
DESCRIPTION: an optional description for the rule library.

Create a rule template

To create a custom rule template, select one of the following:

Console

In the Google Cloud console, go to the Data profiling & quality page.

Go to Data profiling & quality
Go to Rule libraries > Custom.
Click the rule library where you want to add a template, and then click Create.
In the Create rule template window, fill in the following fields:
1. Optional: Enter a name for the template.
2. In Template ID, enter an ID. For more information, see the resource naming conventions.
3. Optional: Enter a description.
4. In the Dimension menu, select a dimension. For more information, see Dimensions.
5. In the SQL query field, enter the following example query that validates each column value is between two values:
```
SELECT * FROM ${data()} WHERE NOT(${column()}>=${param(min_value)} AND ${column()}<=${param(max_value)}) IS TRUE
```
6. Optional: To enable the rule referencing this template to specify a threshold for success criteria, select Support threshold.
7. Optional: To allow rules referencing this template to ignore null values in the column for determining success criteria, select Support ignore null.
8. In Input Parameters, click Add input parameter, and then for each parameter used in the SQL query, enter an input name, description, and default value. In the preceding example, the names would be min_value and max_value.
9. Click Save.

REST

To create a custom rule template, create an entry of type data-quality-rule-template:

gcurl -X POST "https://${DATAPLEX_API}/entryGroups/ENTRY_GROUP_ID/entries?entry_id=TEMPLATE_ID" \
--data @- << EOF
{
"entryType": "projects/dataplex-types/locations/global/entryTypes/data-quality-rule-template",
"entrySource": {
  "displayName": "DISPLAY_NAME",
  "description": "DESCRIPTION"
},
"aspects": {
  "dataplex-types.global.data-quality-rule-template": {
     "data": {
       "dimension": "VALIDITY",
       "sqlCollection": [
         {
           "query": "SELECT t.* FROM ${data()} AS t LEFT JOIN `${param(reference_table)}` AS s ON t.${column()} = s.`${param(reference_column)}` WHERE s.`${param(reference_column)}` IS NULL"
         }
       ],
       "inputParameters": {
         "PARAMETER_NAME": { "description": "PARAMETER_DESCRIPTION" }
       },
       "capabilities": ["THRESHOLD"]
     }
  }
}
}
EOF

Replace the following:

ENTRY_GROUP_ID: the ID of the entry group that stores your rule template.
TEMPLATE_ID: a unique ID for your rule template.
DISPLAY_NAME: a display name for the rule template.
DESCRIPTION: a description of the rule template.
PARAMETER_NAME: the name of an input parameter used in the SQL query.
PARAMETER_DESCRIPTION: a description of the input parameter.

Terraform

To create a custom rule template, use the google_dataplex_entry resource:

resource "google_dataplex_entry" "rule_template" {
project        = "PROJECT_ID"
location       = "LOCATION"
entry_id       = "TEMPLATE_ID"
entry_group_id = "ENTRY_GROUP_ID"

entry_type = "projects/dataplex-types/locations/global/entryTypes/data-quality-rule-template"

entry_source {
display_name = "DISPLAY_NAME"
description  = "DESCRIPTION"
}

aspects {
aspect_key = "dataplex-types.global.data-quality-rule-template"
aspect {
  data = jsonencode({
    dimension = "VALIDITY"
    sqlCollection = [
      {
        query = "SELECT t.* FROM $${data()} AS t LEFT JOIN `$${param(reference_table)}` AS s ON t.$${column()} = s.`$${param(reference_column)}` WHERE s.`$${param(reference_column)}` IS NULL"
      }
    ]
    inputParameters = {
      "PARAMETER_NAME" = { description = "PARAMETER_DESCRIPTION" }
    }
    capabilities = ["THRESHOLD"]
  })
}
}
}

Replace the following:


PROJECT_ID: your project ID.
LOCATION: the location for your rule template (for example, us-central1).
ENTRY_GROUP_ID: the ID of the entry group that stores your rule template.
TEMPLATE_ID: a unique ID for your rule template.
DISPLAY_NAME: a display name for the rule template.
DESCRIPTION: a description of the rule template.
PARAMETER_NAME: the name of an input parameter used in the SQL query.
PARAMETER_DESCRIPTION: a description of the input parameter.


Update a rule template

To update an existing rule template, select one of the following options:

Console

In the Google Cloud console, go to the Data profiling & quality page.



Go to Data profiling & quality

Go to Rule libraries > Custom.
Click the rule library that contains the template you want to update.
In the Rule templates list, click the template that you want to update.
On the rule template details page, click
editEdit.
Update the fields, and then click Save.

REST
To update a custom rule template, patch the entry or specific aspect:

gcurl -X PATCH "https://${DATAPLEX_API}/entryGroups/ENTRY_GROUP_ID/entries/TEMPLATE_ID?updateMask=aspects" \
--data @- << EOF
{
 "aspects": {
   "dataplex-types.global.data-quality-rule-template": {
     "data": {
       "dimension": "VALIDITY",
       "sqlCollection": [
         {
           "query": "SELECT * FROM ${data()} WHERE ${column()} IS NOT NULL"
         }
       ]
     }
   }
 }
}
EOF

Replace the following:


ENTRY_GROUP_ID: the ID of the entry group that stores your rule template.
TEMPLATE_ID: the ID of the rule template that you want to update.

Terraform
To update a custom rule template, use the
google_dataplex_entry
resource:

resource "google_dataplex_entry" "rule_template" {
project        = "PROJECT_ID"
location       = "LOCATION"
entry_id       = "TEMPLATE_ID"
entry_group_id = "ENTRY_GROUP_ID"

entry_type = "projects/dataplex-types/locations/global/entryTypes/data-quality-rule-template"

aspects {
aspect_key = "dataplex-types.global.data-quality-rule-template"
aspect {
  data = jsonencode({
    dimension = "VALIDITY"
    sqlCollection = [
      {
        query = "SELECT * FROM $${data()} WHERE $${column()} IS NOT NULL"
      }
    ]
  })
}
}
}

Replace the following:


PROJECT_ID: your project ID.
LOCATION: the location for your rule template (for example, us-central1).
ENTRY_GROUP_ID: the ID of the entry group that stores your rule template.
TEMPLATE_ID: the ID of the rule template that you want to update.


Delete a rule template

To delete an existing rule template, select one of the following options:

Console

In the Google Cloud console, go to the Data profiling & quality page.



Go to Data profiling & quality

Go to Rule libraries > Custom.
Click the rule library that contains the template you want to delete.
In the Rule templates list, click the template that you want to delete.
Click Delete, and then click Delete again to confirm.

REST
To delete a custom rule template, delete the entry:

gcurl -X DELETE \
"https://${DATAPLEX_API}/entryGroups/ENTRY_GROUP_ID/entries/TEMPLATE_ID"

Replace the following:


ENTRY_GROUP_ID: the ID of the entry group that stores your rule template.
TEMPLATE_ID: the ID of the rule template that you want to delete.


Create a data quality scan using template rules

Use your custom templates to define rules for a data quality scan.

Console

In the Google Cloud console, go to the Data profiling & quality page.



Go to Data profiling & quality

Follow the steps to create a data quality scan, but update the following:


In the Define scan window, in the Credential type menu, select Service account, and then enter a service account. A service account is mandatory for using rule templates.
In the Data quality rules window, define the rules to configure for this data quality scan:

Click Add rules > Template rules.
You can either select Attach rule to entire table, or in
Choose columns, browse and select the columns
to apply rules for.
In Choose rule templates, select the rule templates to use.
Only the rule templates in the same location as the scan or in a
global location can be used.
Alternatively, you can also select
system rule templates from the list.
Click Ok.
Click editEdit rule, and then add rule specific parameters.
Click Save.
Select the rules that you want to add, and then click Select. The rules are now added to your current rules list.
Optional: Repeat the previous steps to add additional rules to the data quality scan.
Click Continue.

Proceed with the remaining scan configuration.
Click Create to only create the scan, or click Run scan to create and immediately run the scan.


REST
To create a data quality scan that references a rule template, specify the
templateReference. Custom rule templates use project-specific paths, while
system rule templates use a global path:
projects/dataplex-templates/locations/global/entryGroups/rule-library/entries/<var>SYSTEM_TEMPLATE_ID</var>.

The following example creates a scan that uses a custom rule template and
includes a filter to selectively run rules:

gcurl -X POST "https://${DATAPLEX_API}/dataScans?data_scan_id=DATASCAN_ID" \
--data @- << EOF
{
"data": {
  "resource": "//bigquery.googleapis.com/projects/BIGQUERY_PROJECT_ID/datasets/DATASET_ID/tables/TABLE_ID"
},
"executionIdentity": {
  "serviceAccount": { "email": "SERVICE_ACCOUNT_EMAIL" }
},
"executionSpec": { "trigger": { "onDemand": {} } },
"type": "DATA_QUALITY",
"dataQualitySpec": {
  "rules": [
    {
      "templateReference": {
        "name": "projects/PROJECT_ID/locations/LOCATION/entryGroups/ENTRY_GROUP_ID/entries/TEMPLATE_ID",
        "values": { "PARAMETER_NAME": { "value" : "PARAMETER_VALUE" } }
      },
      "column": "COLUMN_NAME",
      "name": "RULE_NAME"
    }
  ],
  "filter": "FILTER_CONDITION"
}
}
EOF

Replace the following:


PROJECT_ID: your project ID.
LOCATION: the location for your rule template and scan (for example, us-central1).
DATASCAN_ID: the ID of the data quality scan.
BIGQUERY_PROJECT_ID: the project ID of the BigQuery table.
DATASET_ID: the BigQuery dataset ID.
TABLE_ID: the BigQuery table ID.
SERVICE_ACCOUNT_EMAIL: the email ID of the service account to run the scan.
ENTRY_GROUP_ID: the ID of the entry group that stores your rule template.
TEMPLATE_ID: the ID of the custom rule template.
SYSTEM_TEMPLATE_ID: the ID of the system rule template (for example, non_null_expectation).
PARAMETER_NAME: the name of an input parameter for the rule template.
PARAMETER_VALUE: the value for the input parameter.
COLUMN_NAME: the column to apply the rule to.
RULE_NAME: a name for the rule instance.
FILTER_CONDITION: an optional AIP-160 filter string to selectively run rules (for example, name = \"RULE_NAME\").

Terraform
To create a data quality scan that references a rule template, use the
google_dataplex_datascan
resource:

resource "google_dataplex_datascan" "scan" {
data_scan_id = "DATASCAN_ID"
location     = "LOCATION"
project      = "PROJECT_ID"

data {
resource = "//bigquery.googleapis.com/projects/BIGQUERY_PROJECT_ID/datasets/DATASET_ID/tables/TABLE_ID"
}

execution_spec {
service_account = "SERVICE_ACCOUNT_EMAIL"
trigger {
  on_demand {}
}
}

data_quality_spec {
rules {
  column    = "COLUMN_NAME"
  name      = "RULE_NAME"
  dimension = "VALIDITY"

  template_reference {
    name = "projects/PROJECT_ID/locations/LOCATION/entryGroups/ENTRY_GROUP_ID/entries/TEMPLATE_ID"
    values = {
      "PARAMETER_NAME" = { value = "PARAMETER_VALUE" }
    }
  }
}
filter = "FILTER_CONDITION"
}
}

Replace the following:


PROJECT_ID: your project ID.
LOCATION: the location for your rule template and scan (for example, us-central1).
DATASCAN_ID: the ID of the data quality scan.
BIGQUERY_PROJECT_ID: the project ID of the BigQuery table.
DATASET_ID: the BigQuery dataset ID.
TABLE_ID: the BigQuery table ID.
SERVICE_ACCOUNT_EMAIL: the email ID of the service account to run the scan.
ENTRY_GROUP_ID: the ID of the entry group that stores your rule template.
TEMPLATE_ID: the ID of the custom rule template.
PARAMETER_NAME: the name of an input parameter for the rule template.
PARAMETER_VALUE: the value for the input parameter.
COLUMN_NAME: the column to apply the rule to.
RULE_NAME: a name for the rule instance.
FILTER_CONDITION: an optional AIP-160 filter string to selectively run rules.


Run and monitor data quality scans

After you create a data quality scan, you must run it to validate your data.
For more information, see Run a data quality scan.

You can then monitor the scan jobs and view the results. For more information,
see View the data quality scan results.

Attach data quality rules to catalog entries

You can declare data quality rules as aspects in
Knowledge Catalog to make them searchable and reusable across
scans.

BigQuery table

To define rules directly on a BigQuery table entry, select
one of the following:

Console

In the Google Cloud console, go to the Knowledge Catalog Search page.



Go to Search

Search for and select the table that you want to attach rules to.
Click Data quality > Rules management > Create rules.
In the Create rules window, do the following:


In the Choose create option menu, select Create new rule.
In Choose columns, click Browse. Select the columns to
apply rules for.
In the Choose rule types menu, select the rule templates to use. Only
the rule templates in the same location as the scan can be used.
Click editEdit rule, and then add rule specific parameters.
Click Save.

The Rules management page displays all entry rules.


REST
To attach rules to a specific column using the API, patch the @bigquery
entry with a data-rules aspect targeted to that column:

gcurl -X PATCH "https://${DATAPLEX_API}/entryGroups/@bigquery/entries/projects/PROJECT_ID/locations/LOCATION/datasets/DATASET_ID/tables/TABLE_ID?updateMask=aspects&aspect_keys=projects/dataplex-types/locations/global/aspectTypes/data-rules@Schema.COLUMN_NAME" \
--data @- << EOF
{
"aspects": {
  "dataplex-types.global.data-rules@Schema.COLUMN_NAME": {
    "aspectType": "projects/dataplex-types/locations/global/aspectTypes/data-rules",
    "data": {
      "rules": [
        {
          "templateReference": "projects/PROJECT_ID/locations/LOCATION/entryGroups/ENTRY_GROUP_ID/entries/TEMPLATE_ID",
          "column": "COLUMN_NAME",
            "values": { "PARAMETER_NAME": { "value" : "PARAMETER_VALUE" } }
          }
      ]
    }
  }
}
}
EOF

Replace the following:


PROJECT_ID: your project ID.
LOCATION: the location for your rule template and aspect.
DATASET_ID: the BigQuery dataset ID.
TABLE_ID: the BigQuery table ID.
COLUMN_NAME: the column to apply the rule to.
ENTRY_GROUP_ID: the ID of the entry group that stores your rule template.
TEMPLATE_ID: the ID of the rule template.
PARAMETER_NAME: the name of an input parameter for the rule template.
PARAMETER_VALUE: the value for the input parameter.

Terraform
To attach rules to a specific column, use the
google_dataplex_entry
resource:

resource "google_dataplex_entry" "bq_table_metadata" {
  project        = "PROJECT_ID"
  location       = "LOCATION"
  entry_id       = "bigquery.googleapis.com/projects/PROJECT_ID/datasets/DATASET_ID/tables/TABLE_ID"
  entry_group_id = "@bigquery"

  aspects {
    aspect_key = "dataplex-types.global.data-rules@Schema.COLUMN_NAME"
    aspect {
      data = jsonencode({
        rules = [
          {
            name      = "RULE_NAME"
            dimension = "VALIDITY"
            templateReference = "projects/PROJECT_ID/locations/LOCATION/entryGroups/ENTRY_GROUP_ID/entries/TEMPLATE_ID"
               values = {
                 "PARAMETER_NAME" = { value = "PARAMETER_VALUE" }
               }
            }
        ]
      })
    }
  }
}

Replace the following:


PROJECT_ID: your project ID.
LOCATION: the location for your rule template and aspect.
DATASET_ID: the BigQuery dataset ID.
TABLE_ID: the BigQuery table ID.
COLUMN_NAME: the column to apply the rule to.
ENTRY_GROUP_ID: the ID of the entry group that stores your rule template.
TEMPLATE_ID: the ID of the rule template.
PARAMETER_NAME: the name of an input parameter for the rule template.
PARAMETER_VALUE: the value for the input parameter.
RULE_NAME: a unique name for the rule.


Business glossary terms

You can attach rules to business glossary terms. Rules attached to terms are
automatically inherited by linked BigQuery tables.

Console

In the Google Cloud console, go to the Knowledge Catalog Glossaries page.



Go to Glossaries

Search for and select the business glossary term.
In the Data quality rules section, click add Add.
In the Create rules window, do the following:


In the Choose create option menu, select Create new rule.
In the Choose rule types menu, select the rule templates to
use. Only the rule templates in the same location as the scan can be used.
Click editEdit rule,
and then add rule specific parameters.
Click Save.

Attach the term to a BigQuery table or columns. For
more information, see Manage links between terms and data assets.

REST
To attach rules to a term using the API, patch the @dataplex entry for the
glossary term:

gcurl -X PATCH "https://${DATAPLEX_API}/entryGroups/@dataplex/entries/projects/PROJECT_ID/locations/LOCATION/glossaries/GLOSSARY_ID/terms/TERM_ID?updateMask=aspects&aspect_keys=projects/dataplex-types/locations/global/aspectTypes/data-rules" \
--data @- << EOF
{
"aspects": {
  "dataplex-types.global.data-rules": {
    "aspectType": "projects/dataplex-types/locations/global/aspectTypes/data-rules",
    "data": {
      "rules": [
        {
          "templateReference": "projects/PROJECT_ID/locations/LOCATION/entryGroups/ENTRY_GROUP_ID/entries/TEMPLATE_ID",
          "column": "COLUMN_NAME",
            "values": { "PARAMETER_NAME": { "value" : "PARAMETER_VALUE" } }
          }
      ]
    }
  }
}
}
EOF

Replace the following:


PROJECT_ID: your project ID.
LOCATION: the location for your rule template and aspect.
GLOSSARY_ID: the ID of the business glossary.
TERM_ID: the ID of the glossary term.
ENTRY_GROUP_ID: the ID of the entry group that stores your rule template.
TEMPLATE_ID: the ID of the rule template.
COLUMN_NAME: the column to apply the rule to.
PARAMETER_NAME: the name of an input parameter for the rule template.
PARAMETER_VALUE: the value for the input parameter.

Terraform
To attach rules to a business glossary term, use the
google_dataplex_entry
resource:

resource "google_dataplex_entry" "glossary_term_rules" {
  project        = "PROJECT_ID"
  location       = "LOCATION"
  entry_id       = "projects/PROJECT_ID/locations/LOCATION/glossaries/GLOSSARY_ID/terms/TERM_ID"
  entry_group_id = "@dataplex"

  aspects {
    aspect_key = "dataplex-types.global.data-rules"
    aspect {
      data = jsonencode({
        rules = [
          {
            name      = "RULE_NAME"
            dimension = "VALIDITY"
            templateReference = "projects/PROJECT_ID/locations/LOCATION/entryGroups/ENTRY_GROUP_ID/entries/TEMPLATE_ID"
              values = {
                "PARAMETER_NAME" = { value = "PARAMETER_VALUE" }
              }
            }
        ]
      })
    }
  }
}

Replace the following:


PROJECT_ID: your project ID.
LOCATION: the location for your rule template and aspect.
GLOSSARY_ID: the ID of the business glossary.
TERM_ID: the ID of the glossary term.
ENTRY_GROUP_ID: the ID of the entry group that stores your rule template.
TEMPLATE_ID: the ID of the rule template.
PARAMETER_NAME: the name of an input parameter for the rule template.
PARAMETER_VALUE: the value for the input parameter.
RULE_NAME: a unique name for the rule.


Import rules from another table

You can import data quality rules from an existing BigQuery table
entry to your current table.

Console

In the Google Cloud console, go to the Knowledge Catalog Search page.



Go to Search

Select the table you want to manage rules for.
Click Data quality > Rules management.
Click Create rules.
In the Create rules window, do the following:


In the Choose create option menu, select Import rules from another table.
In Table, click Browse. Search for and select the
source table containing the rules that you want to copy.
Select the rules. You can also edit the rules.
Click Save.

The Rules management tab displays the new rules.


REST
To import rules, you must fetch the data-rules aspect from the source entry
and apply it to the target entry.


Get the data-rules aspect from the source entry:

gcurl "https://dataplex.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/entryGroups/ENTRY_GROUP_ID/entries/SOURCE_ENTRY_ID?view=FULL"
Extract the rules list from the dataplex-types.global.data-rules aspect.
Attach the rules to a target entry.

Replace the following:


PROJECT_ID: your project ID.
LOCATION: the location of the source entry.
ENTRY_GROUP_ID: the ID of the entry group for the source entry.
SOURCE_ENTRY_ID: the ID of the source entry.



View data quality rules for BigQuery table

You can view all rules applicable to a table, including rules
attached directly and rules inherited from linked glossary terms.

Console

In the Google Cloud console, go to the Knowledge Catalog Search page.



Go to Search

Search for and select the table.
Click Data quality > Rules management to view all rules.


Create a data quality scan using rules from catalog

You can selectively run rules declared on catalog entries in a scan.

Console

In the Google Cloud console, go to the Data profiling & quality page.



Go to Data profiling & quality

Follow the steps to create a data quality scan, but update the following:


In the Define scan window, do the following:

In the Credential type menu, select Service account, and then enter a service account. A service account is mandatory for using rule templates.
For Rule type, select Create with entry based rule.

In the Data quality rules section, rules
applicable to the table entry are displayed, including rules inherited
from linked glossary terms. To filter the rules, do the following:

In the Filter items field, filter items to selectively run rules.
Click Apply. Filtered rules are displayed.

Proceed with the remaining scan configuration.
Click Create to only create the scan, or click Run scan to
create and immediately run the scan.


Subsequent runs evaluate rules attached to the entry or inherited from glossary terms as observed at the time of execution.

REST
To run rules from catalog entries, set enableCatalogBasedRules to true.
You can also specify a filter.

To create the scan, use the following code:

gcurl -X POST "https://${DATAPLEX_API}/dataScans?data_scan_id=DATASCAN_ID" \
--data @- << EOF
{
"type": "DATA_QUALITY",
"data": {
  "resource": "//bigquery.googleapis.com/projects/PROJECT_ID/datasets/DATASET_ID/tables/TABLE_ID"
},
"executionIdentity": {
  "serviceAccount": { "email": "SERVICE_ACCOUNT_EMAIL" }
},
"executionSpec": { "trigger": { "onDemand": {} } },
"dataQualitySpec": {
  "enableCatalogBasedRules": true,
  "filter": "FILTER_CONDITION"
}
}
EOF

Replace the following:


PROJECT_ID: your project ID.
LOCATION: the location for your data scan.
DATASCAN_ID: the ID of the data quality scan.
DATASET_ID: the BigQuery dataset ID.
TABLE_ID: the BigQuery table ID.
SERVICE_ACCOUNT_EMAIL: the email ID of the service account to run the scan.
FILTER_CONDITION: an AIP-160 filter string to selectively run rules (for example, attributes.environment = \"prod\").

Terraform
To run rules from catalog entries, use the
google_dataplex_datascan
resource:

resource "google_dataplex_datascan" "scan" {
data_scan_id = "DATASCAN_ID"
location     = "LOCATION"
project      = "PROJECT_ID"

data {
resource = "//bigquery.googleapis.com/projects/PROJECT_ID/datasets/DATASET_ID/tables/TABLE_ID"
}

execution_spec {
service_account = "SERVICE_ACCOUNT_EMAIL"
trigger {
  on_demand {}
}
}

data_quality_spec {
enable_catalog_based_rules = true
filter = "FILTER_CONDITION"
}
}

Replace the following:


PROJECT_ID: your project ID.
LOCATION: the location for your data scan.
DATASCAN_ID: the ID of the data quality scan.
DATASET_ID: the BigQuery dataset ID.
TABLE_ID: the BigQuery table ID.
SERVICE_ACCOUNT_EMAIL: the email ID of the service account to run the scan.
FILTER_CONDITION: an AIP-160 filter string to selectively run rules.


Pricing

Using Knowledge Catalog rule reusability involves the following pricing
elements:


BigQuery charges: BigQuery charges for the job that runs in the scan
project. For more information, see BigQuery pricing.
Knowledge Catalog data quality scan: There's no charge for
processing since BigQuery charges for the job.
Metadata storage: data-rules aspect and data-quality-rule-template
aspect storage is charged as metadata storage. For more information, see
Knowledge Catalog pricing.


What's next


Learn more about auto data quality overview.
Learn how to use auto data quality scans.
View a complete list of system rule templates.
Learn about metadata management.

重複使用資料品質規則 透過集合功能整理內容 你可以依據偏好儲存及分類內容。

用途

事前準備

設定 Dataplex API 環境

設定服務帳戶

必要角色和權限

規則範本的 SQL 查詢語法

System-supported parameters

Example 1: Validate column values are between two values

Example 2: Foreign key validation

System rule templates

Console

REST

Known differences between system rule templates and built-in rules

Metadata aspects

data-rules aspect fields

data-quality-rule-template aspect fields

Manage data quality rule templates

Create a rule library

Console

REST

Terraform

Create a rule template

Console

REST

Terraform

Update a rule template

Console

REST

Terraform

Delete a rule template

Console

REST

Create a data quality scan using template rules

Console

REST

Terraform

Run and monitor data quality scans

Attach data quality rules to catalog entries

BigQuery table

Console

REST

Terraform

Business glossary terms

Console

REST

Terraform

Import rules from another table

Console

REST

View data quality rules for BigQuery table

Console

Create a data quality scan using rules from catalog

Console

REST

Terraform

Pricing

What's next

重複使用資料品質規則

`data-rules` aspect fields

`data-quality-rule-template` aspect fields