이 문서에서는 Knowledge Catalog (이전의 Dataplex Universal Catalog) 데이터 품질 규칙을 재사용하여 표준화된 비즈니스 규칙을 정의하고 관리하는 방법을 설명합니다.
규칙 재사용성을 사용하면 규칙 템플릿을 사용하여 여러 데이터 품질 규칙 및 스캔에서 복잡하거나 표준화된 비즈니스 규칙 정의를 공유할 수 있습니다. 또한 이 문서에서는 재사용 가능한 규칙 템플릿을 설정, 생성, 관리하는 방법과 데이터 품질 규칙을 카탈로그 항목에 메타데이터 관점으로 연결하는 방법을 설명합니다.
사용 사례
다음 시나리오에서 데이터 품질 규칙 재사용성을 사용할 수 있습니다.
- 규칙 정의 표준화 및 공유: 맞춤 규칙 템플릿을 사용하여 복잡하거나 표준화된 비즈니스 규칙 정의를 저장합니다. 이렇게 하면 템플릿화된 SQL 표현식을 사용하여 일반적인 정의를 배포하는 데 필요한 시간과 노력이 줄어듭니다. 예를 들어 중앙 데이터 거버넌스팀은 조직 전체에서 재사용되는 표준 유효한 이메일 또는 유효한 사회보장번호(SSN) 템플릿을 정의하여 일관성을 보장하고 중복 규칙을 관리하는 운영 오버헤드를 줄일 수 있습니다.
- 거버넌스 기반 품질 구현: BigQuery 테이블 및 비즈니스 용어집 용어 항목에서 Knowledge Catalog 관점을 사용하여 데이터 규칙을 메3} 선언합니다. 이렇게 하면 규칙을 검색하고 재사용할 수 있습니다. 예를 들어 열을 용어집 용어에 연결하면 해당 용어에 정의된 유효성 검사 규칙을 자동으로 상속하여 시맨틱 메타데이터 상속을 통해 자동화된 거버넌스 정책을 사용할 수 있습니다.
- 재사용 가능한 규칙 검색 및 탐색: 시맨틱 검색을 통해 조직 내에서 기존 규칙을 찾습니다. 이를 통해 데이터 분석가와 엔지니어는 검증된 표준화된 규칙 집합 (예: '기준 재무 상수')을 탐색하고 SQL을 처음부터 작성하지 않고도 새 프로젝트의 데이터 품질을 부트스트랩할 수 있습니다.
- 콜드 스타트 문제 해결: null 검사 또는 범위 기대치와 같이 자주 사용되는 평가에 시스템 규칙 템3플릿을 활용합니다. 이러한 기본 제공 템플릿을 사용하면 커스텀 SQL을 작성하지 않고도 일반적인 시나리오에 대한 데이터 품질 모니터링을 빠르게 설정할 수 있습니다.
- 관심사 분리 사용 설정: 중앙 거버넌스팀은 검증된 규칙 템플릿을 작성할 수 있으며 엔지니어링팀은 복잡한 SQL을 작성하거나 유지관리하지 않고도 이러한 규칙을 데이터 애셋에 적용하는 데 집중할 수 있습니다. 이러한 명확한 책임 분할은 조직의 민첩성을 개선하고 데이터 품질 표준이 엔터프라이즈 전체에 일관되게 적용되도록 합니다.
시작하기 전에
-
Dataplex API를 사용 설정합니다.
API 사용 설정에 필요한 역할
API를 사용 설정하려면
serviceusage.services.enable권한이 포함된 서비스 사용량 관리자 IAM 역할(roles/serviceusage.serviceUsageAdmin)이 필요합니다. 역할 부여 방법 알아보기
데이터 품질 규칙 재사용성을 사용하기 전에 다음 요구사항을 완료했는지 확인하세요.
Dataplex API 환경 설정
이 문서의 REST API 예시를 사용하려면 gcurl의 별칭을 설정하고 ${DATAPLEX_API} 환경 변수를 구성합니다.
gcurl의 별칭을 설정합니다. 이렇게 하면 인증 토큰이 포함된 바로가기가 생성되고 API 요청의 JSON 콘텐츠 유형이 설정됩니다.alias gcurl='curl -H "Authorization: Bearer $(gcloud auth print-access-token)" -H "Content-Type: application/json"'DATAPLEX_API변수를 설정합니다.DATAPLEX_API="dataplex.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION"
다음을 바꿉니다.
PROJECT_ID: 프로젝트 ID입니다.LOCATION: 스캔 또는 리소스가 있는 위치입니다 (예:us-central1).
서비스 계정 설정
서비스 계정은 재사용 가능한 규칙으로 데이터 품질 스캔을 실행하는 데 필수입니다. 서비스 계정을 만들어 다음 Identity and Access Management 역할 및 권한을 갖도록 합니다.
- 서비스 계정을 호스팅하는 프로젝트에 대한
iam.serviceAccounts.actAs권한이 있어야 합니다 (일반적으로roles/iam.serviceAccountUser역할을 사용). - 스캔 프로젝트의 Dataplex 서비스 에이전트 (
service-PROJECT_ID@gcp-sa-dataplex.iam.gserviceaccount.com)에 서비스 계정에 대한iam.serviceAccounts.getAccessToken권한을 부여합니다 (예:roles/iam.serviceAccountTokenCreator역할 사용). - 이 서비스 계정에는 다음 권한이 있어야 합니다.
- 스캔할 테이블에 대한
bigquery.tables.getData(예:roles/bigquery.dataViewer사용). - 스캔 프로젝트의
bigquery.jobs.insert(예:roles/bigquery.jobUser사용). - 내보내기 데이터 세트에 대한
roles/bigquery.dataEditor(내보내기를 사용하는 경우).
- 스캔할 테이블에 대한
필수 역할 및 권한
특정 작업에 필요한 다음 IAM 역할이 있는지 확인합니다.
- 데이터 스캔 관리: 필요한 데이터 스캔 역할 데이터 스캔 리소스를 관리합니다.
- 규칙 템플릿 관리: 규칙 템플릿을 만들거나 업데이트하려면 규칙 템플릿의
항목 그룹 또는 프로젝트 내에서 항목을 관리하는 데 필요한 권한이
있어야 합니다. 특히
roles/dataplex.catalogEditor또는roles/dataplex.entryOwner는 이러한 권한을 부여합니다. - 규칙에서 규칙 템플릿 참조: 규칙에서 참조하는 규칙 템플릿의 항목 그룹 또는
프로젝트에 대한
dataplex.entries.get및dataplex.entries.getData권한이 있어야 합니다. - BigQuery 테이블에 데이터 품질 규칙 연결: 데이터
품질 규칙을 Knowledge Catalog 메타데이터로 연결하려면 다음 중 하나가 있어야 합니다.
- 테이블의
bigquery.tables.update또는roles/bigquery.dataEditor및 테이블 위치의@bigquery항목 그룹에 대한dataplex.entryGroups.useDataRulesAspect. @bigquery항목 그룹에 대한roles/dataplex.catalogEditor.
- 테이블의
- 비즈니스 용어집 용어에 데이터 품질 규칙 연결: 데이터
품질 규칙을 Knowledge Catalog 메타데이터로 연결하려면 다음 중 하나가 있어야 합니다.
- 용어에 대한
dataplex.glossaryTerms.update및@dataplex항목 그룹에 대한dataplex.entryGroups.useDataRulesAspect. @dataplex항목 그룹에 대한roles/dataplex.catalogEditor.
- 용어에 대한
- 항목 기반 규칙으로 데이터 품질 스캔 만들기: 다음 중 하나가 있어야 합니다.
다음 중 하나가 있어야 합니다.
bigquery.tables.get및bigquery.tables.getData테이블에 대한.- 테이블 위치의
@bigquery항목 그룹에 대한dataplex.entries.get및dataplex.entries.getData.
규칙 템플릿의 SQL 쿼리 구문
규칙 템플릿의 SQL 로직을 작성할 때는 잘못된 행을 반환하는 문을 제공해야 합니다. 쿼리가 행을 반환하면 규칙이 실패합니다. 자세한 내용은 SqlAssertion을 참조하세요.
규칙 템플릿 SQL을 작성할 때는 다음 가이드라인을 따르세요.
- SQL 문에서 후행 세미콜론을 생략합니다.
${param(name)}을 사용하여 입력 매개변수를 참조합니다(예:${param(min_value)}).- Use
$${...}to escape a literal${...}and prevent it from being replaced as a parameter. - Parameter variables are case-sensitive.
System-supported parameters
You can use the following system-supported parameters in your rule template SQL:
${project()}: The project ID of the resource being scanned.${dataset()}: The BigQuery dataset ID of the resource being scanned, formatted asPROJECT_ID.DATASET_ID.${table()}: The BigQuery table ID of the resource being scanned, formatted asPROJECT_ID.DATASET_ID.TABLE_ID.${column()}: The column the rule is evaluated on. An error occurs during rule evaluation if the rule is attached to the table level but references${column()}.${data()}: A reference to the data source table and all of its precondition filters like row filters, sampling percentages, and incremental filters defined in the scan specification. For more information, see Data reference parameter.
Example 1: Validate column values are between two values
The following example validates that all values in a column are between a minimum and maximum value:
SELECT *
FROM ${data()}
WHERE
NOT ((${column()}>=${param(min_value)} AND ${column()}<=${param(max_value)}) IS TRUE)
Note the following:
- Using
NOT(condition) IS TRUEreturns invalid rows, including rows withNULLvalues in the column. - Using
${data()}limits the scope of rows evaluated to the source table and its filters, such as row filters, sampling percentages, and incremental filters. - Using
${column()}lets you reference the column that the rule using this template is evaluated on.
Example 2: Foreign key validation
The following example verifies that each value in a column exists in a primary key column of another table:
SELECT t.*
FROM ${data()} AS t
LEFT JOIN `${param(reference_table)}` AS s
ON t.${column()} = s.`${param(reference_column)}`
WHERE s.`${param(reference_column)}` IS NULL
Input parameters for this template are as follows:
reference_table: The name of the reference table containing the primary keys. Use the formatPROJECT_ID.DATASET_ID.TABLE_ID.reference_column: The name of the primary key column in the reference table.
System rule templates
Knowledge Catalog provides system rule templates that can be used in any
region. Knowledge Catalog manages these templates in the
dataplex-templates project under the rule-library entry group. An example of
a full resource name is
projects/dataplex-templates/locations/global/entryGroups/rule-library/entries/non_null_expectation.
To view the list of all the available system rule templates, see System rule templates list.
To find the available list of system rule templates, select one of the following options:
Console
In the Google Cloud console, go to the Data profiling & quality page.
Click Rule libraries > System.
To see the list of available system rule templates, click rule-library.
When creating a new rule, you can select the system rule templates in the Choose rule types menu.
REST
To find the available list of system rule templates, use the
entries.list method:
gcurl "https://dataplex.googleapis.com/v1/projects/dataplex-templates/locations/global/entryGroups/rule-library/entries"
Known differences between system rule templates and built-in rules
The following table describes the differences between system rule templates and built-in rules:
| Feature | System rule templates | Built-in rules |
|---|---|---|
| Source | Reusable templates in the catalog | Built-in in the API |
| Referencing | Can be referenced by catalog entries and scans | Can only be used in scans |
The following list describes additional differences in how metrics are calculated for system rule templates:
- Assertion Row Count metric: This metric is populated for all template reference rules, not just SQL assertion rules.
- Statistic Range Expectation rule template: Rule metrics from evaluation
of rules referencing this template wouldn't contain the
nullCountmetric. Because it is an aggregate rule, theignore nullcapability isn't supported, and rule success is determined by the aggregate statistic being within the defined range. Uniqueness Expectation rule template: This template calculates
passedCountdifferently than the built-inUniquenessExpectationrule. The rule template returns all rows for which duplicate values or null rows exist, which can result in fewer passing rows if duplicates are present.For example, if a column contains the values
(a, a, b, b, c, d, e):- Built-in uniqueness rule: Returns 5 passing rows:
(a, b, c, d, e). - Uniqueness rule template: Returns 4 failing rows:
(a, a, b, b). The number of passing rows is 3 (7 total rows minus 4 failed rows):(c, d, e).
- Built-in uniqueness rule: Returns 5 passing rows:
Metadata aspects
This section describes the fields and values for the data-rules and
data-quality-rule-template aspect types.
data-rules aspect fields
To define data rules, use the dataplex-types.global.data-rules aspect. The following table describes the fields for this aspect.
| Field | Type | Description |
|---|---|---|
rules |
Array | Required. A list of data quality rules. |
rules[].name |
String | Required. A name for the rule. |
rules[].dimension |
String | Optional. The data quality dimension for the rule. |
rules[].description |
String | Optional. The description of the rule. |
rules[].suspended |
Boolean | Optional. Whether the rule is active or suspended. Default is false. |
rules[].threshold |
Double | Optional. The passing threshold for the rule, from 0.0 to 1.0. Default is 1.0. |
rules[].type |
Enum | Required. The type of the rule. The only supported value is TEMPLATE_REFERENCE. |
rules[].ignore_null |
Boolean | Optional. If true, rows with null values in the column are ignored when determining the success criteria. |
rules[].attributes |
Map | Optional. Custom key-value pairs associated with the rule. |
rules[].templateReference |
Object | Required. A reference to the rule template. |
rules[].templateReference.name |
String | Required. The resource name of the rule template. |
rules[].templateReference.values |
Map | Optional. The parameter names and values for the rule template. |
rules[].templateReference.values[].parameterValue.value |
String | Required. The value for the parameter. |
The following example shows a data-rules aspect in a payload.json file:
{
"aspects": {
"dataplex-types.global.data-rules": {
"data": {
"rules": [
{
"name": "valid-email",
"dimension": "VALIDITY",
"type": "TEMPLATE_REFERENCE",
"templateReference": {
"name": "projects/my-project/locations/us-central1/entryGroups/my-rules/entries/email-check",
"values": {
"column_name": {
"value": "email"
}
}
}
}
]
}
}
}
}
data-quality-rule-template aspect fields
Use the data-quality-rule-template aspect to define a custom data quality
rule template. The following table describes the fields for the
dataplex-types.global.data-quality-rule-template aspect.
| Field | Type | Description |
|---|---|---|
dimension |
String | Required. The dimension for the rule template. |
sqlCollection |
Array | Required. A list of SQL queries for the rule template. |
sqlCollection[].sql.query |
String | Required. The SQL query that returns invalid rows. |
inputParameters |
Map | Optional. A map of input parameters for the rule template. |
inputParameters[].parameterDescription.description |
String | Optional. The description of the input parameter. |
inputParameters[].parameterDescription.defaultValue |
String | Optional. The default value for the parameter if no value is provided. |
capabilities |
Array | Optional. A list of template capabilities, such as THRESHOLD or IGNORE_NULL. |
The following example displays the structure of a data-quality-rule-template
aspect:
{
"entryType": "projects/dataplex-types/locations/global/entryTypes/data-quality-rule-template",
"aspects": {
"dataplex-types.global.data-quality-rule-template": {
"data": {
"dimension": "COMPLETENESS",
"sqlCollection": [
{
"query": "SELECT * FROM ${data()} WHERE ${column()} > ${param(p1)}"
}
],
"inputParameters": {
"p1": {
"description": "The parameter description"
}
},
"capabilities": [
"THRESHOLD",
"IGNORE_NULL"
]
}
}
}
}
Manage data quality rule templates
This section describes how to create, edit, and delete rule templates.
Create a rule library
To create a rule library, you must create a Knowledge Catalog entry group.
Console
In the Google Cloud console, go to the Data profiling & quality page.
Go to Rule libraries > Custom, and click Create.
In the Create rule library window, fill in the following fields:
- Optional: Enter a display name.
- In Rule library ID, enter an ID. For more information, see the resource naming conventions.
- Optional: Enter a description.
- In the Location menu, select a location. It can't be changed later.
- Optional: Add labels. Labels are key-value pairs that let you group related objects together or with other Google Cloud resources.
- Click Save.
REST
To create a rule library by using the API, you must create an entry group with
the required label goog-dataplex-entry-group-type: rule_library:
gcurl -X POST "https://${DATAPLEX_API}/entryGroups?entryGroup_id=RULE_LIBRARY_ID" \ --data @- << EOF { "labels": { "goog-dataplex-entry-group-type": "rule_library" }, "description": "DESCRIPTION" } EOF
Replace the following:
RULE_LIBRARY_ID: a unique ID for your rule library.DESCRIPTION: an optional description for the rule library.
Terraform
To create a rule library, use the
google_dataplex_entry_group
resource:
resource "google_dataplex_entry_group" "rule_library" { project = "PROJECT_ID" location = "LOCATION" entry_group_id = "RULE_LIBRARY_ID" description = "DESCRIPTION" labels = { "goog-dataplex-entry-group-type" = "rule_library" } }
Replace the following:
PROJECT_ID: your project ID.LOCATION: the location for your rule library (for example,us-central1).RULE_LIBRARY_ID: a unique ID for your rule library.DESCRIPTION: an optional description for the rule library.
Create a rule template
To create a custom rule template, select one of the following:
Console
In the Google Cloud console, go to the Data profiling & quality page.
Go to Rule libraries > Custom.
Click the rule library where you want to add a template, and then click Create.
In the Create rule template window, fill in the following fields:
- Optional: Enter a name for the template.
- In Template ID, enter an ID. For more information, see the resource naming conventions.
- Optional: Enter a description.
- In the Dimension menu, select a dimension. For more information, see Dimensions.
In the SQL query field, enter the following example query that validates each column value is between two values:
SELECT * FROM ${data()} WHERE NOT(${column()}>=${param(min_value)} AND ${column()}<=${param(max_value)}) IS TRUEOptional: To enable the rule referencing this template to specify a threshold for success criteria, select Support threshold.
Optional: To allow rules referencing this template to ignore null values in the column for determining success criteria, select Support ignore null.
In Input Parameters, click Add input parameter, and then for each parameter used in the SQL query, enter an input name, description, and default value. In the preceding example, the names would be
min_valueandmax_value.Click Save.
REST
To create a custom rule template, create an entry of type
data-quality-rule-template:
gcurl -X POST "https://${DATAPLEX_API}/entryGroups/ENTRY_GROUP_ID/entries?entry_id=TEMPLATE_ID" \ --data @- << EOF { "entryType": "projects/dataplex-types/locations/global/entryTypes/data-quality-rule-template", "entrySource": { "displayName": "DISPLAY_NAME", "description": "DESCRIPTION" }, "aspects": { "dataplex-types.global.data-quality-rule-template": { "data": { "dimension": "VALIDITY", "sqlCollection": [ { "query": "SELECT t.* FROM ${data()} AS t LEFT JOIN `${param(reference_table)}` AS s ON t.${column()} = s.`${param(reference_column)}` WHERE s.`${param(reference_column)}` IS NULL" } ], "inputParameters": { "PARAMETER_NAME": { "description": "PARAMETER_DESCRIPTION" } }, "capabilities": ["THRESHOLD"] } } } } EOF
Replace the following:
ENTRY_GROUP_ID: the ID of the entry group that stores your rule template.TEMPLATE_ID: a unique ID for your rule template.DISPLAY_NAME: a display name for the rule template.DESCRIPTION: a description of the rule template.PARAMETER_NAME: the name of an input parameter used in the SQL query.PARAMETER_DESCRIPTION: a description of the input parameter.
Terraform
To create a custom rule template, use the
google_dataplex_entry
resource:
resource "google_dataplex_entry" "rule_template" { project = "PROJECT_ID" location = "LOCATION" entry_id = "TEMPLATE_ID" entry_group_id = "ENTRY_GROUP_ID" entry_type = "projects/dataplex-types/locations/global/entryTypes/data-quality-rule-template" entry_source { display_name = "DISPLAY_NAME" description = "DESCRIPTION" } aspects { aspect_key = "dataplex-types.global.data-quality-rule-template" aspect { data = jsonencode({ dimension = "VALIDITY" sqlCollection = [ { query = "SELECT t.* FROM $${data()} AS t LEFT JOIN `$${param(reference_table)}` AS s ON t.$${column()} = s.`$${param(reference_column)}` WHERE s.`$${param(reference_column)}` IS NULL" } ] inputParameters = { "PARAMETER_NAME" = { description = "PARAMETER_DESCRIPTION" } } capabilities = ["THRESHOLD"] }) } } }Replace the following:
PROJECT_ID: your project ID.LOCATION: the location for your rule template (for example,us-central1).ENTRY_GROUP_ID: the ID of the entry group that stores your rule template.TEMPLATE_ID: a unique ID for your rule template.DISPLAY_NAME: a display name for the rule template.DESCRIPTION: a description of the rule template.PARAMETER_NAME: the name of an input parameter used in the SQL query.PARAMETER_DESCRIPTION: a description of the input parameter.Update a rule template
To update an existing rule template, select one of the following options:
Console
In the Google Cloud console, go to the Data profiling & quality page.
Go to Rule libraries > Custom.
Click the rule library that contains the template you want to update.
In the Rule templates list, click the template that you want to update.
On the rule template details page, click Edit.
Update the fields, and then click Save.
REST
To update a custom rule template, patch the entry or specific aspect:
gcurl -X PATCH "https://${DATAPLEX_API}/entryGroups/ENTRY_GROUP_ID/entries/TEMPLATE_ID?updateMask=aspects" \ --data @- << EOF { "aspects": { "dataplex-types.global.data-quality-rule-template": { "data": { "dimension": "VALIDITY", "sqlCollection": [ { "query": "SELECT * FROM ${data()} WHERE ${column()} IS NOT NULL" } ] } } } } EOFReplace the following:
ENTRY_GROUP_ID: the ID of the entry group that stores your rule template.TEMPLATE_ID: the ID of the rule template that you want to update.Terraform
To update a custom rule template, use the
google_dataplex_entryresource: resource "google_dataplex_entry" "rule_template" { project = "PROJECT_ID" location = "LOCATION" entry_id = "TEMPLATE_ID" entry_group_id = "ENTRY_GROUP_ID" entry_type = "projects/dataplex-types/locations/global/entryTypes/data-quality-rule-template" aspects { aspect_key = "dataplex-types.global.data-quality-rule-template" aspect { data = jsonencode({ dimension = "VALIDITY" sqlCollection = [ { query = "SELECT * FROM $${data()} WHERE $${column()} IS NOT NULL" } ] }) } } }Replace the following:
PROJECT_ID: your project ID.LOCATION: the location for your rule template (for example,us-central1).ENTRY_GROUP_ID: the ID of the entry group that stores your rule template.TEMPLATE_ID: the ID of the rule template that you want to update.Delete a rule template
To delete an existing rule template, select one of the following options:
Console
In the Google Cloud console, go to the Data profiling & quality page.
Go to Rule libraries > Custom.
Click the rule library that contains the template you want to delete.
In the Rule templates list, click the template that you want to delete.
Click Delete, and then click Delete again to confirm.
REST
To delete a custom rule template, delete the entry:
gcurl -X DELETE \ "https://${DATAPLEX_API}/entryGroups/ENTRY_GROUP_ID/entries/TEMPLATE_ID"Replace the following:
ENTRY_GROUP_ID: the ID of the entry group that stores your rule template.TEMPLATE_ID: the ID of the rule template that you want to delete.Create a data quality scan using template rules
Use your custom templates to define rules for a data quality scan.
Console
In the Google Cloud console, go to the Data profiling & quality page.
Follow the steps to create a data quality scan, but update the following:
- In the Define scan window, in the Credential type menu, select Service account, and then enter a service account. A service account is mandatory for using rule templates.
- In the Data quality rules window, define the rules to configure for this data quality scan:
- Click Add rules > Template rules.
- You can either select Attach rule to entire table, or in Choose columns, browse and select the columns to apply rules for.
- In Choose rule templates, select the rule templates to use. Only the rule templates in the same location as the scan or in a global location can be used. Alternatively, you can also select system rule templates from the list.
- Click Ok.
- Click Edit rule, and then add rule specific parameters.
- Click Save.
- Select the rules that you want to add, and then click Select. The rules are now added to your current rules list.
- Optional: Repeat the previous steps to add additional rules to the data quality scan.
- Click Continue.
- Proceed with the remaining scan configuration.
- Click Create to only create the scan, or click Run scan to create and immediately run the scan.
REST
To create a data quality scan that references a rule template, specify the
templateReference. Custom rule templates use project-specific paths, while system rule templates use a global path:projects/dataplex-templates/locations/global/entryGroups/rule-library/entries/<var>SYSTEM_TEMPLATE_ID</var>.The following example creates a scan that uses a custom rule template and includes a filter to selectively run rules:
gcurl -X POST "https://${DATAPLEX_API}/dataScans?data_scan_id=DATASCAN_ID" \ --data @- << EOF { "data": { "resource": "//bigquery.googleapis.com/projects/BIGQUERY_PROJECT_ID/datasets/DATASET_ID/tables/TABLE_ID" }, "executionIdentity": { "serviceAccount": { "email": "SERVICE_ACCOUNT_EMAIL" } }, "executionSpec": { "trigger": { "onDemand": {} } }, "type": "DATA_QUALITY", "dataQualitySpec": { "rules": [ { "templateReference": { "name": "projects/PROJECT_ID/locations/LOCATION/entryGroups/ENTRY_GROUP_ID/entries/TEMPLATE_ID", "values": { "PARAMETER_NAME": { "value" : "PARAMETER_VALUE" } } }, "column": "COLUMN_NAME", "name": "RULE_NAME" } ], "filter": "FILTER_CONDITION" } } EOFReplace the following:
PROJECT_ID: your project ID.LOCATION: the location for your rule template and scan (for example,us-central1).DATASCAN_ID: the ID of the data quality scan.BIGQUERY_PROJECT_ID: the project ID of the BigQuery table.DATASET_ID: the BigQuery dataset ID.TABLE_ID: the BigQuery table ID.SERVICE_ACCOUNT_EMAIL: the email ID of the service account to run the scan.ENTRY_GROUP_ID: the ID of the entry group that stores your rule template.TEMPLATE_ID: the ID of the custom rule template.SYSTEM_TEMPLATE_ID: the ID of the system rule template (for example,non_null_expectation).PARAMETER_NAME: the name of an input parameter for the rule template.PARAMETER_VALUE: the value for the input parameter.COLUMN_NAME: the column to apply the rule to.RULE_NAME: a name for the rule instance.FILTER_CONDITION: an optional AIP-160 filter string to selectively run rules (for example,name = \"RULE_NAME\").Terraform
To create a data quality scan that references a rule template, use the
google_dataplex_datascanresource: resource "google_dataplex_datascan" "scan" { data_scan_id = "DATASCAN_ID" location = "LOCATION" project = "PROJECT_ID" data { resource = "//bigquery.googleapis.com/projects/BIGQUERY_PROJECT_ID/datasets/DATASET_ID/tables/TABLE_ID" } execution_spec { service_account = "SERVICE_ACCOUNT_EMAIL" trigger { on_demand {} } } data_quality_spec { rules { column = "COLUMN_NAME" name = "RULE_NAME" dimension = "VALIDITY" template_reference { name = "projects/PROJECT_ID/locations/LOCATION/entryGroups/ENTRY_GROUP_ID/entries/TEMPLATE_ID" values = { "PARAMETER_NAME" = { value = "PARAMETER_VALUE" } } } } filter = "FILTER_CONDITION" } }Replace the following:
PROJECT_ID: your project ID.LOCATION: the location for your rule template and scan (for example,us-central1).DATASCAN_ID: the ID of the data quality scan.BIGQUERY_PROJECT_ID: the project ID of the BigQuery table.DATASET_ID: the BigQuery dataset ID.TABLE_ID: the BigQuery table ID.SERVICE_ACCOUNT_EMAIL: the email ID of the service account to run the scan.ENTRY_GROUP_ID: the ID of the entry group that stores your rule template.TEMPLATE_ID: the ID of the custom rule template.PARAMETER_NAME: the name of an input parameter for the rule template.PARAMETER_VALUE: the value for the input parameter.COLUMN_NAME: the column to apply the rule to.RULE_NAME: a name for the rule instance.FILTER_CONDITION: an optional AIP-160 filter string to selectively run rules.Run and monitor data quality scans
After you create a data quality scan, you must run it to validate your data. For more information, see Run a data quality scan.
You can then monitor the scan jobs and view the results. For more information, see View the data quality scan results.
Attach data quality rules to catalog entries
You can declare data quality rules as aspects in Knowledge Catalog to make them searchable and reusable across scans.
BigQuery table
To define rules directly on a BigQuery table entry, select one of the following:
Console
In the Google Cloud console, go to the Knowledge Catalog Search page.
Search for and select the table that you want to attach rules to.
Click Data quality > Rules management > Create rules.
In the Create rules window, do the following:
- In the Choose create option menu, select Create new rule.
- In Choose columns, click Browse. Select the columns to apply rules for.
- In the Choose rule types menu, select the rule templates to use. Only the rule templates in the same location as the scan can be used.
- Click Edit rule, and then add rule specific parameters.
Click Save.
The Rules management page displays all entry rules.
REST
To attach rules to a specific column using the API, patch the
@bigqueryentry with adata-rulesaspect targeted to that column: gcurl -X PATCH "https://${DATAPLEX_API}/entryGroups/@bigquery/entries/projects/PROJECT_ID/locations/LOCATION/datasets/DATASET_ID/tables/TABLE_ID?updateMask=aspects&aspect_keys=projects/dataplex-types/locations/global/aspectTypes/data-rules@Schema.COLUMN_NAME" \ --data @- << EOF { "aspects": { "dataplex-types.global.data-rules@Schema.COLUMN_NAME": { "aspectType": "projects/dataplex-types/locations/global/aspectTypes/data-rules", "data": { "rules": [ { "templateReference": "projects/PROJECT_ID/locations/LOCATION/entryGroups/ENTRY_GROUP_ID/entries/TEMPLATE_ID", "column": "COLUMN_NAME", "values": { "PARAMETER_NAME": { "value" : "PARAMETER_VALUE" } } } ] } } } } EOFReplace the following:
PROJECT_ID: your project ID.LOCATION: the location for your rule template and aspect.DATASET_ID: the BigQuery dataset ID.TABLE_ID: the BigQuery table ID.COLUMN_NAME: the column to apply the rule to.ENTRY_GROUP_ID: the ID of the entry group that stores your rule template.TEMPLATE_ID: the ID of the rule template.PARAMETER_NAME: the name of an input parameter for the rule template.PARAMETER_VALUE: the value for the input parameter.Terraform
To attach rules to a specific column, use the
google_dataplex_entryresource: resource "google_dataplex_entry" "bq_table_metadata" { project = "PROJECT_ID" location = "LOCATION" entry_id = "bigquery.googleapis.com/projects/PROJECT_ID/datasets/DATASET_ID/tables/TABLE_ID" entry_group_id = "@bigquery" aspects { aspect_key = "dataplex-types.global.data-rules@Schema.COLUMN_NAME" aspect { data = jsonencode({ rules = [ { name = "RULE_NAME" dimension = "VALIDITY" templateReference = "projects/PROJECT_ID/locations/LOCATION/entryGroups/ENTRY_GROUP_ID/entries/TEMPLATE_ID" values = { "PARAMETER_NAME" = { value = "PARAMETER_VALUE" } } } ] }) } } }Replace the following:
PROJECT_ID: your project ID.LOCATION: the location for your rule template and aspect.DATASET_ID: the BigQuery dataset ID.TABLE_ID: the BigQuery table ID.COLUMN_NAME: the column to apply the rule to.ENTRY_GROUP_ID: the ID of the entry group that stores your rule template.TEMPLATE_ID: the ID of the rule template.PARAMETER_NAME: the name of an input parameter for the rule template.PARAMETER_VALUE: the value for the input parameter.RULE_NAME: a unique name for the rule.Business glossary terms
You can attach rules to business glossary terms. Rules attached to terms are automatically inherited by linked BigQuery tables.
Console
In the Google Cloud console, go to the Knowledge Catalog Glossaries page.
Search for and select the business glossary term.
In the Data quality rules section, click Add.
In the Create rules window, do the following:
- In the Choose create option menu, select Create new rule.
- In the Choose rule types menu, select the rule templates to use. Only the rule templates in the same location as the scan can be used.
- Click Edit rule, and then add rule specific parameters.
- Click Save.
Attach the term to a BigQuery table or columns. For more information, see Manage links between terms and data assets.
REST
To attach rules to a term using the API, patch the
@dataplexentry for the glossary term: gcurl -X PATCH "https://${DATAPLEX_API}/entryGroups/@dataplex/entries/projects/PROJECT_ID/locations/LOCATION/glossaries/GLOSSARY_ID/terms/TERM_ID?updateMask=aspects&aspect_keys=projects/dataplex-types/locations/global/aspectTypes/data-rules" \ --data @- << EOF { "aspects": { "dataplex-types.global.data-rules": { "aspectType": "projects/dataplex-types/locations/global/aspectTypes/data-rules", "data": { "rules": [ { "templateReference": "projects/PROJECT_ID/locations/LOCATION/entryGroups/ENTRY_GROUP_ID/entries/TEMPLATE_ID", "column": "COLUMN_NAME", "values": { "PARAMETER_NAME": { "value" : "PARAMETER_VALUE" } } } ] } } } } EOFReplace the following:
PROJECT_ID: your project ID.LOCATION: the location for your rule template and aspect.GLOSSARY_ID: the ID of the business glossary.TERM_ID: the ID of the glossary term.ENTRY_GROUP_ID: the ID of the entry group that stores your rule template.TEMPLATE_ID: the ID of the rule template.COLUMN_NAME: the column to apply the rule to.PARAMETER_NAME: the name of an input parameter for the rule template.PARAMETER_VALUE: the value for the input parameter.Terraform
To attach rules to a business glossary term, use the
google_dataplex_entryresource: resource "google_dataplex_entry" "glossary_term_rules" { project = "PROJECT_ID" location = "LOCATION" entry_id = "projects/PROJECT_ID/locations/LOCATION/glossaries/GLOSSARY_ID/terms/TERM_ID" entry_group_id = "@dataplex" aspects { aspect_key = "dataplex-types.global.data-rules" aspect { data = jsonencode({ rules = [ { name = "RULE_NAME" dimension = "VALIDITY" templateReference = "projects/PROJECT_ID/locations/LOCATION/entryGroups/ENTRY_GROUP_ID/entries/TEMPLATE_ID" values = { "PARAMETER_NAME" = { value = "PARAMETER_VALUE" } } } ] }) } } }Replace the following:
PROJECT_ID: your project ID.LOCATION: the location for your rule template and aspect.GLOSSARY_ID: the ID of the business glossary.TERM_ID: the ID of the glossary term.ENTRY_GROUP_ID: the ID of the entry group that stores your rule template.TEMPLATE_ID: the ID of the rule template.PARAMETER_NAME: the name of an input parameter for the rule template.PARAMETER_VALUE: the value for the input parameter.RULE_NAME: a unique name for the rule.Import rules from another table
You can import data quality rules from an existing BigQuery table entry to your current table.
Console
In the Google Cloud console, go to the Knowledge Catalog Search page.
Select the table you want to manage rules for.
Click Data quality > Rules management.
Click Create rules.
In the Create rules window, do the following:
- In the Choose create option menu, select Import rules from another table.
- In Table, click Browse. Search for and select the source table containing the rules that you want to copy.
- Select the rules. You can also edit the rules.
Click Save.
The Rules management tab displays the new rules.
REST
To import rules, you must fetch the
data-rulesaspect from the source entry and apply it to the target entry.
Get the
data-rulesaspect from the source entry: gcurl "https://dataplex.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/entryGroups/ENTRY_GROUP_ID/entries/SOURCE_ENTRY_ID?view=FULL"Extract the
ruleslist from thedataplex-types.global.data-rulesaspect.Attach the rules to a target entry.
Replace the following:
PROJECT_ID: your project ID.LOCATION: the location of the source entry.ENTRY_GROUP_ID: the ID of the entry group for the source entry.SOURCE_ENTRY_ID: the ID of the source entry.View data quality rules for BigQuery table
You can view all rules applicable to a table, including rules attached directly and rules inherited from linked glossary terms.
Console
In the Google Cloud console, go to the Knowledge Catalog Search page.
Search for and select the table.
Click Data quality > Rules management to view all rules.
Create a data quality scan using rules from catalog
You can selectively run rules declared on catalog entries in a scan.
Console
In the Google Cloud console, go to the Data profiling & quality page.
Follow the steps to create a data quality scan, but update the following:
- In the Define scan window, do the following:
- In the Credential type menu, select Service account, and then enter a service account. A service account is mandatory for using rule templates.
- For Rule type, select Create with entry based rule.
- In the Data quality rules section, rules applicable to the table entry are displayed, including rules inherited from linked glossary terms. To filter the rules, do the following:
- In the Filter items field, filter items to selectively run rules.
- Click Apply. Filtered rules are displayed.
- Proceed with the remaining scan configuration.
- Click Create to only create the scan, or click Run scan to create and immediately run the scan.
Subsequent runs evaluate rules attached to the entry or inherited from glossary terms as observed at the time of execution.
REST
To run rules from catalog entries, set
enableCatalogBasedRulestotrue. You can also specify a filter.To create the scan, use the following code:
gcurl -X POST "https://${DATAPLEX_API}/dataScans?data_scan_id=DATASCAN_ID" \ --data @- << EOF { "type": "DATA_QUALITY", "data": { "resource": "//bigquery.googleapis.com/projects/PROJECT_ID/datasets/DATASET_ID/tables/TABLE_ID" }, "executionIdentity": { "serviceAccount": { "email": "SERVICE_ACCOUNT_EMAIL" } }, "executionSpec": { "trigger": { "onDemand": {} } }, "dataQualitySpec": { "enableCatalogBasedRules": true, "filter": "FILTER_CONDITION" } } EOFReplace the following:
PROJECT_ID: your project ID.LOCATION: the location for your data scan.DATASCAN_ID: the ID of the data quality scan.DATASET_ID: the BigQuery dataset ID.TABLE_ID: the BigQuery table ID.SERVICE_ACCOUNT_EMAIL: the email ID of the service account to run the scan.FILTER_CONDITION: an AIP-160 filter string to selectively run rules (for example,attributes.environment = \"prod\").Terraform
To run rules from catalog entries, use the
google_dataplex_datascanresource: resource "google_dataplex_datascan" "scan" { data_scan_id = "DATASCAN_ID" location = "LOCATION" project = "PROJECT_ID" data { resource = "//bigquery.googleapis.com/projects/PROJECT_ID/datasets/DATASET_ID/tables/TABLE_ID" } execution_spec { service_account = "SERVICE_ACCOUNT_EMAIL" trigger { on_demand {} } } data_quality_spec { enable_catalog_based_rules = true filter = "FILTER_CONDITION" } }Replace the following:
PROJECT_ID: your project ID.LOCATION: the location for your data scan.DATASCAN_ID: the ID of the data quality scan.DATASET_ID: the BigQuery dataset ID.TABLE_ID: the BigQuery table ID.SERVICE_ACCOUNT_EMAIL: the email ID of the service account to run the scan.FILTER_CONDITION: an AIP-160 filter string to selectively run rules.Pricing
Using Knowledge Catalog rule reusability involves the following pricing elements:
- BigQuery charges: BigQuery charges for the job that runs in the scan project. For more information, see BigQuery pricing.
- Knowledge Catalog data quality scan: There's no charge for processing since BigQuery charges for the job.
- Metadata storage:
data-rulesaspect anddata-quality-rule-templateaspect storage is charged as metadata storage. For more information, see Knowledge Catalog pricing.What's next
- Learn more about auto data quality overview.
- Learn how to use auto data quality scans.
- View a complete list of system rule templates.
- Learn about metadata management.
달리 명시되지 않는 한 이 페이지의 콘텐츠에는 Creative Commons Attribution 4.0 라이선스에 따라 라이선스가 부여되며, 코드 샘플에는 Apache 2.0 라이선스에 따라 라이선스가 부여됩니다. 자세한 내용은 Google Developers 사이트 정책을 참조하세요. 자바는 Oracle 및/또는 Oracle 계열사의 등록 상표입니다.
최종 업데이트: 2026-04-17(UTC)