本页面介绍了如何使用 Dataplex API 将术语库、类别、术语和条目链接批量导入到 Knowledge Catalog(以前称为 Dataplex Universal Catalog)。您可以使用此流程从其他编目工具迁移元数据,也可以通过将 JSON 文件上传到 Cloud Storage 来对现有术语库执行批量更新。
准备工作
在开始导入流程之前,请完成以下前提条件:
创建 Cloud Storage 存储桶
创建 Cloud Storage 存储桶以充当导入文件的暂存区。
所需的角色
如需获得使用 JSON 文件导入词库和条目链接所需的权限,请让您的管理员为您授予以下 IAM 角色:
- 针对项目的 Dataplex Metadata Editor (
roles/dataplex.metadataEditor) - 针对包含导入文件的存储桶的 Storage Object Viewer (
roles/storage.objectViewer)
如需详细了解如何授予角色,请参阅管理对项目、文件夹和组织的访问权限。
启用 API
如需导入术语库和条目链接,请在项目中启用 Dataplex API。
启用 API 所需的角色
如需启用 API,您需要拥有 Service Usage Admin IAM 角色 (roles/serviceusage.serviceUsageAdmin),该角色包含 serviceusage.services.enable 权限。了解如何授予角色。
使用 JSON 文件导入术语库
如需导入术语库、类别和术语,请完成以下任务。
创建目标术语库
创建目标术语库以导入元数据。
alias gcurl='curl -H "Authorization: Bearer $(gcloud auth print-access-token)" -H "Content-Type: application/json"' gcurl -X POST https://dataplex.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/glossaries?glossary_id=GLOSSARY_ID -d "$(cat<<EOF { "displayName": "DISPLAY_NAME", "description": "DESCRIPTION" } EOF )"
替换以下内容:
PROJECT_ID:您要在其中创建术语库的项目的 IDLOCATION_ID:您要在其中创建术语库的位置GLOSSARY_ID:术语库 IDDISPLAY_NAME:术语库的显示名称DESCRIPTION:术语库的说明
准备 JSON 文件
创建一个以换行符分隔的 JSON 文件,其中包含要上传到 Cloud Storage 存储桶的词汇表、类别和术语。
使用以下 JSON 架构来构建导入文件:
{"entry":{"name":"projects/PROJECT_NUMBER/locations/LOCATION_ID/entryGroups/@dataplex/entries/projects/PROJECT_NUMBER/locations/LOCATION_ID/glossaries/GLOSSARY_ID/categories/CATEGORY_ID","entryType":"projects/dataplex-types/locations/global/entryTypes/glossary-category","aspects":{"dataplex-types.global.glossary-category-aspect":{"data":{}},"dataplex-types.global.overview":{"data":{"content":"CONTENT"}},"dataplex-types.global.contacts":{"data":{"identities":[{role: "steward", name: "CONTACT_DISPLAY_NAME", id: "CONTACT_EMAIL"}]}}},"parentEntry":"projects/PROJECT_NUMBER/locations/LOCATION_ID/entryGroups/@dataplex/entries/projects/PROJECT_NUMBER/locations/LOCATION_ID/glossaries/GLOSSARY_ID","entrySource":{"resource":"projects/PROJECT_NUMBER/locations/LOCATION_ID/glossaries/GLOSSARY_ID/categories/CATEGORY_ID","displayName":"CATEGORY_NAME","description":"CATEGORY_DESCRIPTION","ancestors":[{"name":"projects/PROJECT_NUMBER/locations/LOCATION_ID/entryGroups/@dataplex/entries/projects/PROJECT_NUMBER/locations/LOCATION_ID/glossaries/GLOSSARY_ID","type":"projects/dataplex-types/locations/global/entryTypes/glossary"}]}}} {"entry":{"name":"projects/PROJECT_NUMBER/locations/LOCATION_ID/entryGroups/@dataplex/entries/projects/PROJECT_NUMBER/locations/LOCATION_ID/glossaries/GLOSSARY_ID/terms/TERM1_ID","entryType":"projects/dataplex-types/locations/global/entryTypes/glossary-term","aspects":{"dataplex-types.global.glossary-term-aspect":{"data":{}},"dataplex-types.global.overview":{"data":{"content":"TERM1_CONTENT"}},"dataplex-types.global.contacts":{"data":{"identities":[{role: "steward", name: "CONTACT_DISPLAY_NAME", id: "CONTACT_EMAIL"}]}}},"parentEntry":"projects/PROJECT_NUMBER/locations/LOCATION_ID/entryGroups/@dataplex/entries/projects/PROJECT_NUMBER/locations/LOCATION_ID/glossaries/GLOSSARY_ID","entrySource":{"resource":"projects/PROJECT_NUMBER/locations/LOCATION_ID/glossaries/GLOSSARY_ID/terms/TERM1_ID","displayName":"TERM1_DISPLAY_NAME","description":"TERM1_DESCRIPTION","ancestors":[{"name":"projects/PROJECT_NUMBER/locations/LOCATION_ID/entryGroups/@dataplex/entries/projects/PROJECT_NUMBER/locations/LOCATION_ID/glossaries/GLOSSARY_ID","type":"projects/dataplex-types/locations/global/entryTypes/glossary"},{"name":"projects/PROJECT_NUMBER/locations/LOCATION_ID/entryGroups/@dataplex/entries/projects/PROJECT_NUMBER/locations/LOCATION_ID/glossaries/GLOSSARY_ID/categories/CATEGORY_ID","type":"projects/dataplex-types/locations/global/entryTypes/glossary-category"}]}}} {"entry":{"name":"projects/PROJECT_NUMBER/locations/LOCATION_ID/entryGroups/@dataplex/entries/projects/PROJECT_NUMBER/locations/LOCATION_ID/glossaries/GLOSSARY_ID/terms/TERM2_ID","entryType":"projects/dataplex-types/locations/global/entryTypes/glossary-term","aspects":{"dataplex-types.global.glossary-term-aspect":{"data":{}},"dataplex-types.global.overview":{"data":{"content":"TERM1_CONTENT"}},"dataplex-types.global.contacts":{"data":{"identities":[{role: "steward", name: "CONTACT_DISPLAY_NAME", id: "CONTACT_EMAIL"}]}}},"parentEntry":"projects/PROJECT_NUMBER/locations/LOCATION_ID/entryGroups/@dataplex/entries/projects/PROJECT_NUMBER/locations/LOCATION_ID/glossaries/GLOSSARY_ID","entrySource":{"resource":"projects/PROJECT_NUMBER/locations/LOCATION_ID/glossaries/GLOSSARY_ID/terms/TERM2_ID","displayName":"TERM2_DISPLAY_NAME","description":"TERM2_DESCRIPTION","ancestors":[{"name":"projects/PROJECT_NUMBER/locations/LOCATION_ID/entryGroups/@dataplex/entries/projects/PROJECT_NUMBER/locations/LOCATION_ID/glossaries/GLOSSARY_ID","type":"projects/dataplex-types/locations/global/entryTypes/glossary"},{"name":"projects/PROJECT_NUMBER/locations/LOCATION_ID/entryGroups/@dataplex/entries/projects/PROJECT_NUMBER/locations/LOCATION_ID/glossaries/GLOSSARY_ID/categories/CATEGORY_ID","type":"projects/dataplex-types/locations/global/entryTypes/glossary-category"}]}}}
导入术语库、类别和术语
将术语库、类别和术语导入 Cloud Storage 存储桶:
# Set GCURL alias alias gcurl='curl -H "Authorization: Bearer $(gcloud auth print-access-token)" -H "Content-Type: application/json"' gcurl https://DATAPLEX_API/metadataJobs?metadata_job_id=JOB_ID -d "$(cat<<EOF { "type":"IMPORT", "import_spec":{ "log_level":"DEBUG", "source_storage_uri":"gs://STORAGE_BUCKET/", "entry_sync_mode":"FULL", "aspect_sync_mode":"INCREMENTAL", "scope":{ "glossaries": "GLOSSARY_NAME" } } } EOF )"
将 DATAPLEX_API 替换为格式为 dataplex.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID 的 Dataplex API 端点。
使用 JSON 文件导入条目链接
如需批量导入条目链接,请完成以下任务。
准备 JSON 文件
创建一个以换行符分隔的 JSON 文件,其中包含要上传到 Cloud Storage 存储桶的条目链接。
使用以下 JSON 架构来构建导入文件:
术语之间的关联的示例格式
{"entryLink":{"name":"projects/PROJECT_NUMBER/locations/global/entryGroups/@dataplex/entryLinks/el-import-0606e3f2-8206-4f3a-aba9-32c6196f6048","entryLinkType":"projects/dataplex-types/locations/global/entryLinkTypes/synonym","entryReferences":[{"name":"projects/PROJECT_NUMBER/locations/global/entryGroups/@dataplex/entries/projects/PROJECT_NUMBER/locations/global/glossaries/import-glossary/terms/term-1"},{"name":"projects/PROJECT_NUMBER/locations/global/entryGroups/@dataplex/entries/projects/PROJECT_NUMBER/locations/global/glossaries/import-glossary/terms/term-2"}]}} {"entryLink":{"name":"projects/PROJECT_NUMBER/locations/global/entryGroups/@dataplex/entryLinks/el-import-2f7408e3-af3d-405d-81bb-861cf9ec5146","entryLinkType":"projects/dataplex-types/locations/global/entryLinkTypes/related","entryReferences":[{"name":"projects/PROJECT_NUMBER/locations/global/entryGroups/@dataplex/entries/projects/PROJECT_NUMBER/locations/global/glossaries/import-glossary/terms/term-1"},{"name":"projects/PROJECT_NUMBER/locations/global/entryGroups/@dataplex/entries/projects/PROJECT_NUMBER/locations/global/glossaries/import-glossary/terms/term-2"}]}}
术语与数据资产之间的关联的示例格式
{"entryLink":{"name":"projects/PROJECT_NUMBER/locations/global/entryGroups/@dataplex/entryLinks/el-import-0606e3f2-8206-4f3a-aba9-32c6196f6048","entryLinkType":"projects/dataplex-types/locations/global/entryLinkTypes/definition","entryReferences":[{"name":"projects/PROJECT_NUMBER/locations/global/entryGroups/@dataplex/entries/projects/PROJECT_NUMBER/locations/global/glossaries/import-glossary/terms/term-1"},{"name":"projects/PROJECT_NUMBER/locations/us-central1/entryGroups/entry-group-1/entries/entry-1"}]}}
将术语之间的关联作为同义词或相关字词导入
如需启动元数据导入作业,以在目录术语之间建立 synonym 或 related 连接,请使用以下命令:
gcurl https://DATAPLEX_API/metadataJobs?metadata_job_id=JOB_ID -d "$(cat<<EOF { "type":"IMPORT", "import_spec":{ "log_level":"DEBUG", "source_storage_uri":"gs://STORAGE_BUCKET/", "entry_sync_mode":"FULL", "aspect_sync_mode":"INCREMENTAL", "scope":{ "entry_groups":[ "projects/PROJECT_ID/locations/LOCATION_ID/entryGroups/@dataplex" ], "entry_link_types":[ "projects/dataplex-types/locations/global/entryLinkTypes/synonym", "projects/dataplex-types/locations/global/entryLinkTypes/related" ], "referenced_entry_scopes":[ "PROJECT_IDS" ] } } } EOF )"
将 DATAPLEX_API 替换为 dataplex.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID。
导入术语与特定数据资产或列之间的关联
如需在术语表术语与数据资产之间创建关联,请针对数据资产条目所属的每个条目组运行导入作业。所有定义条目关联均在此条目组中创建。
gcurl https://DATAPLEX_API/metadataJobs?metadata_job_id=JOB_ID -d "$(cat<<EOF { "type":"IMPORT", "import_spec":{ "log_level":"DEBUG", "source_storage_uri":"gs://STORAGE_BUCKET/", "entry_sync_mode":"FULL", "aspect_sync_mode":"INCREMENTAL", "scope":{ "entry_groups":[ "projects/PROJECT_ID/locations/ENTRY_GROUP_LOCATION_ID/entryGroups/@dataplex" ], "entry_link_types":[ "projects/dataplex-types/locations/global/entryLinkTypes/definition" ], "referenced_entry_scopes":[ "PROJECT_IDS" ] } } } EOF )"
将 DATAPLEX_API 替换为 dataplex.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID。
监控导入作业状态
如需获取
import操作的状态,请运行以下命令:gcurl https://dataplex.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/operations/operation-OPERATION_ID将
OPERATION_ID替换为操作的 ID。如需获取元数据作业的状态,请运行以下命令:
gcurl -X GET https://dataplex.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/metadataJobs/JOB_ID
后续步骤
- 了解如何在 Knowledge Catalog 中管理业务术语库。
- 了解如何从 Google 表格导入词汇表。
- 详细了解元数据管理。