本文說明如何使用 Dataplex API,將詞彙表、類別、字詞和項目連結大量匯入 Knowledge Catalog (舊稱 Dataplex Universal Catalog)。您可以透過這個程序,從其他編目工具遷移中繼資料,或將 JSON 檔案上傳至 Cloud Storage,對現有術語表執行大量更新。
事前準備
開始匯入程序前,請先完成下列必要條件:
建立 Cloud Storage bucket
建立 Cloud Storage bucket,做為匯入檔案的暫存區。
必要的角色
如要取得使用 JSON 檔案匯入字彙表和詞條連結所需的權限,請要求管理員授予您下列 IAM 角色:
- 專案的 Dataplex 中繼資料編輯者 (
roles/dataplex.metadataEditor) - Storage 物件檢視者 (
roles/storage.objectViewer) 在包含匯入檔案的 bucket 中
如要進一步瞭解如何授予角色,請參閱「管理專案、資料夾和組織的存取權」。
啟用 API
如要匯入詞彙表和項目連結,請在專案中啟用 Dataplex API。
啟用 API 時所需的角色
如要啟用 API,您需要服務使用情形管理員 IAM 角色 (roles/serviceusage.serviceUsageAdmin),其中包含 serviceusage.services.enable 權限。瞭解如何授予角色。
使用 JSON 檔案匯入字彙表
如要匯入字彙表、類別和字詞,請完成下列工作。
建立目標詞彙表
建立目標詞彙表,以匯入中繼資料。
alias gcurl='curl -H "Authorization: Bearer $(gcloud auth print-access-token)" -H "Content-Type: application/json"' gcurl -X POST https://dataplex.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/glossaries?glossary_id=GLOSSARY_ID -d "$(cat<<EOF { "displayName": "DISPLAY_NAME", "description": "DESCRIPTION" } EOF )"
更改下列內容:
PROJECT_ID:您要建立詞彙表的專案 IDLOCATION_ID:要建立詞彙表的位置GLOSSARY_ID:詞彙表 IDDISPLAY_NAME:詞彙表的顯示名稱DESCRIPTION:詞彙表說明
準備 JSON 檔案
建立以換行符號分隔的 JSON 檔案,內含要上傳至 Cloud Storage 值區的字彙表、類別和字詞。
請使用下列 JSON 結構定義來建構匯入檔案:
{"entry":{"name":"projects/PROJECT_NUMBER/locations/LOCATION_ID/entryGroups/@dataplex/entries/projects/PROJECT_NUMBER/locations/LOCATION_ID/glossaries/GLOSSARY_ID/categories/CATEGORY_ID","entryType":"projects/dataplex-types/locations/global/entryTypes/glossary-category","aspects":{"dataplex-types.global.glossary-category-aspect":{"data":{}},"dataplex-types.global.overview":{"data":{"content":"CONTENT"}},"dataplex-types.global.contacts":{"data":{"identities":[{role: "steward", name: "CONTACT_DISPLAY_NAME", id: "CONTACT_EMAIL"}]}}},"parentEntry":"projects/PROJECT_NUMBER/locations/LOCATION_ID/entryGroups/@dataplex/entries/projects/PROJECT_NUMBER/locations/LOCATION_ID/glossaries/GLOSSARY_ID","entrySource":{"resource":"projects/PROJECT_NUMBER/locations/LOCATION_ID/glossaries/GLOSSARY_ID/categories/CATEGORY_ID","displayName":"CATEGORY_NAME","description":"CATEGORY_DESCRIPTION","ancestors":[{"name":"projects/PROJECT_NUMBER/locations/LOCATION_ID/entryGroups/@dataplex/entries/projects/PROJECT_NUMBER/locations/LOCATION_ID/glossaries/GLOSSARY_ID","type":"projects/dataplex-types/locations/global/entryTypes/glossary"}]}}} {"entry":{"name":"projects/PROJECT_NUMBER/locations/LOCATION_ID/entryGroups/@dataplex/entries/projects/PROJECT_NUMBER/locations/LOCATION_ID/glossaries/GLOSSARY_ID/terms/TERM1_ID","entryType":"projects/dataplex-types/locations/global/entryTypes/glossary-term","aspects":{"dataplex-types.global.glossary-term-aspect":{"data":{}},"dataplex-types.global.overview":{"data":{"content":"TERM1_CONTENT"}},"dataplex-types.global.contacts":{"data":{"identities":[{role: "steward", name: "CONTACT_DISPLAY_NAME", id: "CONTACT_EMAIL"}]}}},"parentEntry":"projects/PROJECT_NUMBER/locations/LOCATION_ID/entryGroups/@dataplex/entries/projects/PROJECT_NUMBER/locations/LOCATION_ID/glossaries/GLOSSARY_ID","entrySource":{"resource":"projects/PROJECT_NUMBER/locations/LOCATION_ID/glossaries/GLOSSARY_ID/terms/TERM1_ID","displayName":"TERM1_DISPLAY_NAME","description":"TERM1_DESCRIPTION","ancestors":[{"name":"projects/PROJECT_NUMBER/locations/LOCATION_ID/entryGroups/@dataplex/entries/projects/PROJECT_NUMBER/locations/LOCATION_ID/glossaries/GLOSSARY_ID","type":"projects/dataplex-types/locations/global/entryTypes/glossary"},{"name":"projects/PROJECT_NUMBER/locations/LOCATION_ID/entryGroups/@dataplex/entries/projects/PROJECT_NUMBER/locations/LOCATION_ID/glossaries/GLOSSARY_ID/categories/CATEGORY_ID","type":"projects/dataplex-types/locations/global/entryTypes/glossary-category"}]}}} {"entry":{"name":"projects/PROJECT_NUMBER/locations/LOCATION_ID/entryGroups/@dataplex/entries/projects/PROJECT_NUMBER/locations/LOCATION_ID/glossaries/GLOSSARY_ID/terms/TERM2_ID","entryType":"projects/dataplex-types/locations/global/entryTypes/glossary-term","aspects":{"dataplex-types.global.glossary-term-aspect":{"data":{}},"dataplex-types.global.overview":{"data":{"content":"TERM1_CONTENT"}},"dataplex-types.global.contacts":{"data":{"identities":[{role: "steward", name: "CONTACT_DISPLAY_NAME", id: "CONTACT_EMAIL"}]}}},"parentEntry":"projects/PROJECT_NUMBER/locations/LOCATION_ID/entryGroups/@dataplex/entries/projects/PROJECT_NUMBER/locations/LOCATION_ID/glossaries/GLOSSARY_ID","entrySource":{"resource":"projects/PROJECT_NUMBER/locations/LOCATION_ID/glossaries/GLOSSARY_ID/terms/TERM2_ID","displayName":"TERM2_DISPLAY_NAME","description":"TERM2_DESCRIPTION","ancestors":[{"name":"projects/PROJECT_NUMBER/locations/LOCATION_ID/entryGroups/@dataplex/entries/projects/PROJECT_NUMBER/locations/LOCATION_ID/glossaries/GLOSSARY_ID","type":"projects/dataplex-types/locations/global/entryTypes/glossary"},{"name":"projects/PROJECT_NUMBER/locations/LOCATION_ID/entryGroups/@dataplex/entries/projects/PROJECT_NUMBER/locations/LOCATION_ID/glossaries/GLOSSARY_ID/categories/CATEGORY_ID","type":"projects/dataplex-types/locations/global/entryTypes/glossary-category"}]}}}
匯入字彙表、類別和字詞
將詞彙表、類別和字詞匯入 Cloud Storage bucket:
# Set GCURL alias alias gcurl='curl -H "Authorization: Bearer $(gcloud auth print-access-token)" -H "Content-Type: application/json"' gcurl https://DATAPLEX_API/metadataJobs?metadata_job_id=JOB_ID -d "$(cat<<EOF { "type":"IMPORT", "import_spec":{ "log_level":"DEBUG", "source_storage_uri":"gs://STORAGE_BUCKET/", "entry_sync_mode":"FULL", "aspect_sync_mode":"INCREMENTAL", "scope":{ "glossaries": "GLOSSARY_NAME" } } } EOF )"
將 DATAPLEX_API 替換為 Dataplex API 端點,格式為 dataplex.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID。
使用 JSON 檔案匯入項目連結
如要大量匯入項目連結,請完成下列任務。
準備 JSON 檔案
建立以換行符號分隔的 JSON 檔案,內含要上傳至 Cloud Storage bucket 的項目連結。
請使用下列 JSON 結構定義來建構匯入檔案:
術語間連結的格式範例
{"entryLink":{"name":"projects/PROJECT_NUMBER/locations/global/entryGroups/@dataplex/entryLinks/el-import-0606e3f2-8206-4f3a-aba9-32c6196f6048","entryLinkType":"projects/dataplex-types/locations/global/entryLinkTypes/synonym","entryReferences":[{"name":"projects/PROJECT_NUMBER/locations/global/entryGroups/@dataplex/entries/projects/PROJECT_NUMBER/locations/global/glossaries/import-glossary/terms/term-1"},{"name":"projects/PROJECT_NUMBER/locations/global/entryGroups/@dataplex/entries/projects/PROJECT_NUMBER/locations/global/glossaries/import-glossary/terms/term-2"}]}} {"entryLink":{"name":"projects/PROJECT_NUMBER/locations/global/entryGroups/@dataplex/entryLinks/el-import-2f7408e3-af3d-405d-81bb-861cf9ec5146","entryLinkType":"projects/dataplex-types/locations/global/entryLinkTypes/related","entryReferences":[{"name":"projects/PROJECT_NUMBER/locations/global/entryGroups/@dataplex/entries/projects/PROJECT_NUMBER/locations/global/glossaries/import-glossary/terms/term-1"},{"name":"projects/PROJECT_NUMBER/locations/global/entryGroups/@dataplex/entries/projects/PROJECT_NUMBER/locations/global/glossaries/import-glossary/terms/term-2"}]}}
字詞與資料資產之間連結的範例格式
{"entryLink":{"name":"projects/PROJECT_NUMBER/locations/global/entryGroups/@dataplex/entryLinks/el-import-0606e3f2-8206-4f3a-aba9-32c6196f6048","entryLinkType":"projects/dataplex-types/locations/global/entryLinkTypes/definition","entryReferences":[{"name":"projects/PROJECT_NUMBER/locations/global/entryGroups/@dataplex/entries/projects/PROJECT_NUMBER/locations/global/glossaries/import-glossary/terms/term-1"},{"name":"projects/PROJECT_NUMBER/locations/us-central1/entryGroups/entry-group-1/entries/entry-1"}]}}
將字詞之間的連結匯入為同義字或相關字詞
如要啟動中繼資料匯入工作,建立目錄字詞之間的 synonym 或 related 連結,請使用下列指令:
gcurl https://DATAPLEX_API/metadataJobs?metadata_job_id=JOB_ID -d "$(cat<<EOF { "type":"IMPORT", "import_spec":{ "log_level":"DEBUG", "source_storage_uri":"gs://STORAGE_BUCKET/", "entry_sync_mode":"FULL", "aspect_sync_mode":"INCREMENTAL", "scope":{ "entry_groups":[ "projects/PROJECT_ID/locations/LOCATION_ID/entryGroups/@dataplex" ], "entry_link_types":[ "projects/dataplex-types/locations/global/entryLinkTypes/synonym", "projects/dataplex-types/locations/global/entryLinkTypes/related" ], "referenced_entry_scopes":[ "PROJECT_IDS" ] } } } EOF )"
將 DATAPLEX_API 替換為 dataplex.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID。
匯入字詞與特定資料資產或資料欄之間的連結
如要建立詞彙表字詞與資料資產之間的連結,請為每個項目群組執行匯入作業,資料資產的項目即屬於這些群組。所有定義項目連結都會在這個項目群組中建立。
gcurl https://DATAPLEX_API/metadataJobs?metadata_job_id=JOB_ID -d "$(cat<<EOF { "type":"IMPORT", "import_spec":{ "log_level":"DEBUG", "source_storage_uri":"gs://STORAGE_BUCKET/", "entry_sync_mode":"FULL", "aspect_sync_mode":"INCREMENTAL", "scope":{ "entry_groups":[ "projects/PROJECT_ID/locations/ENTRY_GROUP_LOCATION_ID/entryGroups/@dataplex" ], "entry_link_types":[ "projects/dataplex-types/locations/global/entryLinkTypes/definition" ], "referenced_entry_scopes":[ "PROJECT_IDS" ] } } } EOF )"
將 DATAPLEX_API 替換為 dataplex.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID。
監控匯入工作狀態
如要取得
import作業的狀態,請執行下列指令:gcurl https://dataplex.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/operations/operation-OPERATION_ID將
OPERATION_ID替換為作業 ID。如要取得中繼資料工作的狀態,請執行下列指令:
gcurl -X GET https://dataplex.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/metadataJobs/JOB_ID
後續步驟
- 瞭解如何在 Knowledge Catalog 中管理組織詞彙。
- 瞭解如何從 Google 試算表匯入字彙表。
- 進一步瞭解中繼資料管理。