Importare glossari aziendali e link alle voci utilizzando file JSON

Questa pagina descrive come importare in blocco glossari, categorie, termini e link alle voci in Knowledge Catalog (in precedenza Dataplex Universal Catalog) utilizzando l'API Dataplex. Puoi utilizzare questa procedura per eseguire la migrazione dei metadati da altri strumenti di catalogazione o per eseguire aggiornamenti collettivi ai glossari esistenti caricando file JSON in Cloud Storage.

Prima di iniziare

Prima di iniziare la procedura di importazione, completa i seguenti prerequisiti:

Crea un bucket Cloud Storage

Crea un bucket Cloud Storage da utilizzare come area di gestione temporanea per i file di importazione.

Ruoli obbligatori

Per ottenere le autorizzazioni necessarie per importare glossari e link alle voci utilizzando file JSON, chiedi all'amministratore di concederti i seguenti ruoli IAM:

Per saperne di più sulla concessione dei ruoli, consulta Gestisci l'accesso a progetti, cartelle e organizzazioni.

Potresti anche riuscire a ottenere le autorizzazioni richieste tramite i ruoli personalizzati o altri ruoli predefiniti.

Abilita API

Per importare glossari e link alle voci, abilita l'API Dataplex nel tuo progetto.

Ruoli richiesti per abilitare le API

Per abilitare le API, devi disporre del ruolo IAM Amministratore utilizzo dei servizi (roles/serviceusage.serviceUsageAdmin), che include l'autorizzazione serviceusage.services.enable. Scopri come concedere i ruoli.

Abilita le API

Importare glossari utilizzando file JSON

Per importare glossari, categorie e termini, completa le seguenti attività.

Crea un glossario di destinazione

Crea un glossario di destinazione per importare i metadati.

alias gcurl='curl -H "Authorization: Bearer $(gcloud auth print-access-token)" -H "Content-Type: application/json"'

gcurl -X POST https://dataplex.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/glossaries?glossary_id=GLOSSARY_ID -d "$(cat<<EOF

{
   "displayName": "DISPLAY_NAME",
   "description": "DESCRIPTION"
}
EOF
)"

Sostituisci quanto segue:

  • PROJECT_ID: l'ID progetto in cui stai creando il glossario
  • LOCATION_ID: la posizione in cui vuoi creare il glossario
  • GLOSSARY_ID: l'ID del glossario
  • DISPLAY_NAME: il nome visualizzato del glossario
  • DESCRIPTION: la descrizione del glossario

Preparare i file JSON

Crea un file JSON delimitato da nuove righe con il glossario, le categorie e i termini che vuoi caricare nel bucket Cloud Storage.

Utilizza lo schema JSON seguente per strutturare i file di importazione:

   {"entry":{"name":"projects/PROJECT_NUMBER/locations/LOCATION_ID/entryGroups/@dataplex/entries/projects/PROJECT_NUMBER/locations/LOCATION_ID/glossaries/GLOSSARY_ID/categories/CATEGORY_ID","entryType":"projects/dataplex-types/locations/global/entryTypes/glossary-category","aspects":{"dataplex-types.global.glossary-category-aspect":{"data":{}},"dataplex-types.global.overview":{"data":{"content":"CONTENT"}},"dataplex-types.global.contacts":{"data":{"identities":[{role: "steward", name: "CONTACT_DISPLAY_NAME", id: "CONTACT_EMAIL"}]}}},"parentEntry":"projects/PROJECT_NUMBER/locations/LOCATION_ID/entryGroups/@dataplex/entries/projects/PROJECT_NUMBER/locations/LOCATION_ID/glossaries/GLOSSARY_ID","entrySource":{"resource":"projects/PROJECT_NUMBER/locations/LOCATION_ID/glossaries/GLOSSARY_ID/categories/CATEGORY_ID","displayName":"CATEGORY_NAME","description":"CATEGORY_DESCRIPTION","ancestors":[{"name":"projects/PROJECT_NUMBER/locations/LOCATION_ID/entryGroups/@dataplex/entries/projects/PROJECT_NUMBER/locations/LOCATION_ID/glossaries/GLOSSARY_ID","type":"projects/dataplex-types/locations/global/entryTypes/glossary"}]}}}
   {"entry":{"name":"projects/PROJECT_NUMBER/locations/LOCATION_ID/entryGroups/@dataplex/entries/projects/PROJECT_NUMBER/locations/LOCATION_ID/glossaries/GLOSSARY_ID/terms/TERM1_ID","entryType":"projects/dataplex-types/locations/global/entryTypes/glossary-term","aspects":{"dataplex-types.global.glossary-term-aspect":{"data":{}},"dataplex-types.global.overview":{"data":{"content":"TERM1_CONTENT"}},"dataplex-types.global.contacts":{"data":{"identities":[{role: "steward", name: "CONTACT_DISPLAY_NAME", id: "CONTACT_EMAIL"}]}}},"parentEntry":"projects/PROJECT_NUMBER/locations/LOCATION_ID/entryGroups/@dataplex/entries/projects/PROJECT_NUMBER/locations/LOCATION_ID/glossaries/GLOSSARY_ID","entrySource":{"resource":"projects/PROJECT_NUMBER/locations/LOCATION_ID/glossaries/GLOSSARY_ID/terms/TERM1_ID","displayName":"TERM1_DISPLAY_NAME","description":"TERM1_DESCRIPTION","ancestors":[{"name":"projects/PROJECT_NUMBER/locations/LOCATION_ID/entryGroups/@dataplex/entries/projects/PROJECT_NUMBER/locations/LOCATION_ID/glossaries/GLOSSARY_ID","type":"projects/dataplex-types/locations/global/entryTypes/glossary"},{"name":"projects/PROJECT_NUMBER/locations/LOCATION_ID/entryGroups/@dataplex/entries/projects/PROJECT_NUMBER/locations/LOCATION_ID/glossaries/GLOSSARY_ID/categories/CATEGORY_ID","type":"projects/dataplex-types/locations/global/entryTypes/glossary-category"}]}}}
   {"entry":{"name":"projects/PROJECT_NUMBER/locations/LOCATION_ID/entryGroups/@dataplex/entries/projects/PROJECT_NUMBER/locations/LOCATION_ID/glossaries/GLOSSARY_ID/terms/TERM2_ID","entryType":"projects/dataplex-types/locations/global/entryTypes/glossary-term","aspects":{"dataplex-types.global.glossary-term-aspect":{"data":{}},"dataplex-types.global.overview":{"data":{"content":"TERM1_CONTENT"}},"dataplex-types.global.contacts":{"data":{"identities":[{role: "steward", name: "CONTACT_DISPLAY_NAME", id: "CONTACT_EMAIL"}]}}},"parentEntry":"projects/PROJECT_NUMBER/locations/LOCATION_ID/entryGroups/@dataplex/entries/projects/PROJECT_NUMBER/locations/LOCATION_ID/glossaries/GLOSSARY_ID","entrySource":{"resource":"projects/PROJECT_NUMBER/locations/LOCATION_ID/glossaries/GLOSSARY_ID/terms/TERM2_ID","displayName":"TERM2_DISPLAY_NAME","description":"TERM2_DESCRIPTION","ancestors":[{"name":"projects/PROJECT_NUMBER/locations/LOCATION_ID/entryGroups/@dataplex/entries/projects/PROJECT_NUMBER/locations/LOCATION_ID/glossaries/GLOSSARY_ID","type":"projects/dataplex-types/locations/global/entryTypes/glossary"},{"name":"projects/PROJECT_NUMBER/locations/LOCATION_ID/entryGroups/@dataplex/entries/projects/PROJECT_NUMBER/locations/LOCATION_ID/glossaries/GLOSSARY_ID/categories/CATEGORY_ID","type":"projects/dataplex-types/locations/global/entryTypes/glossary-category"}]}}}

Importare glossario, categorie e termini

Importa il glossario, le categorie e i termini nel bucket Cloud Storage:

# Set GCURL alias
alias gcurl='curl -H "Authorization: Bearer $(gcloud auth print-access-token)" -H "Content-Type: application/json"'

gcurl https://DATAPLEX_API/metadataJobs?metadata_job_id=JOB_ID -d "$(cat<<EOF
{
   "type":"IMPORT",
   "import_spec":{
      "log_level":"DEBUG",
      "source_storage_uri":"gs://STORAGE_BUCKET/",
      "entry_sync_mode":"FULL",
      "aspect_sync_mode":"INCREMENTAL",
      "scope":{
         "glossaries": "GLOSSARY_NAME"
      }
   }
}
EOF
)"

Sostituisci DATAPLEX_API con l'endpoint dell'API Dataplex nel formato dataplex.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID.

Per importare in blocco i link alle voci, completa le seguenti attività.

Crea un file JSON delimitato da nuove righe con i link alle voci che vuoi caricare nel bucket Cloud Storage.

Utilizza lo schema JSON seguente per strutturare i file di importazione:

Formato di esempio per i link tra i termini

   {"entryLink":{"name":"projects/PROJECT_NUMBER/locations/global/entryGroups/@dataplex/entryLinks/el-import-0606e3f2-8206-4f3a-aba9-32c6196f6048","entryLinkType":"projects/dataplex-types/locations/global/entryLinkTypes/synonym","entryReferences":[{"name":"projects/PROJECT_NUMBER/locations/global/entryGroups/@dataplex/entries/projects/PROJECT_NUMBER/locations/global/glossaries/import-glossary/terms/term-1"},{"name":"projects/PROJECT_NUMBER/locations/global/entryGroups/@dataplex/entries/projects/PROJECT_NUMBER/locations/global/glossaries/import-glossary/terms/term-2"}]}}
   {"entryLink":{"name":"projects/PROJECT_NUMBER/locations/global/entryGroups/@dataplex/entryLinks/el-import-2f7408e3-af3d-405d-81bb-861cf9ec5146","entryLinkType":"projects/dataplex-types/locations/global/entryLinkTypes/related","entryReferences":[{"name":"projects/PROJECT_NUMBER/locations/global/entryGroups/@dataplex/entries/projects/PROJECT_NUMBER/locations/global/glossaries/import-glossary/terms/term-1"},{"name":"projects/PROJECT_NUMBER/locations/global/entryGroups/@dataplex/entries/projects/PROJECT_NUMBER/locations/global/glossaries/import-glossary/terms/term-2"}]}}
   

Formato di esempio per i link tra termini e asset di dati

   {"entryLink":{"name":"projects/PROJECT_NUMBER/locations/global/entryGroups/@dataplex/entryLinks/el-import-0606e3f2-8206-4f3a-aba9-32c6196f6048","entryLinkType":"projects/dataplex-types/locations/global/entryLinkTypes/definition","entryReferences":[{"name":"projects/PROJECT_NUMBER/locations/global/entryGroups/@dataplex/entries/projects/PROJECT_NUMBER/locations/global/glossaries/import-glossary/terms/term-1"},{"name":"projects/PROJECT_NUMBER/locations/us-central1/entryGroups/entry-group-1/entries/entry-1"}]}}
   

Per avviare un job di importazione dei metadati che stabilisce connessioni synonym o related tra i termini del catalogo, utilizza il seguente comando:

gcurl https://DATAPLEX_API/metadataJobs?metadata_job_id=JOB_ID -d "$(cat<<EOF
   {
      "type":"IMPORT",
      "import_spec":{
         "log_level":"DEBUG",
         "source_storage_uri":"gs://STORAGE_BUCKET/",
         "entry_sync_mode":"FULL",
         "aspect_sync_mode":"INCREMENTAL",
         "scope":{
            "entry_groups":[  "projects/PROJECT_ID/locations/LOCATION_ID/entryGroups/@dataplex"
            ],
            "entry_link_types":[
               "projects/dataplex-types/locations/global/entryLinkTypes/synonym",
               "projects/dataplex-types/locations/global/entryLinkTypes/related"
            ],
            "referenced_entry_scopes":[
               "PROJECT_IDS"
            ]
         }
      }
   }
EOF
)"

Sostituisci DATAPLEX_API con dataplex.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID.

Per creare un collegamento tra i termini del glossario e gli asset di dati, esegui l'importazione per ogni gruppo di voci a cui appartiene la voce per l'asset di dati. Tutti i link alle voci di definizione vengono creati in questo gruppo di voci.

   gcurl https://DATAPLEX_API/metadataJobs?metadata_job_id=JOB_ID -d "$(cat<<EOF
   {
   "type":"IMPORT",
   "import_spec":{
      "log_level":"DEBUG",
      "source_storage_uri":"gs://STORAGE_BUCKET/",
      "entry_sync_mode":"FULL",
      "aspect_sync_mode":"INCREMENTAL",
      "scope":{
         "entry_groups":[  "projects/PROJECT_ID/locations/ENTRY_GROUP_LOCATION_ID/entryGroups/@dataplex"
         ],
         "entry_link_types":[
            "projects/dataplex-types/locations/global/entryLinkTypes/definition"
         ],
         "referenced_entry_scopes":[
            "PROJECT_IDS"
         ]
      }
   }
   }
EOF
)"

Sostituisci DATAPLEX_API con dataplex.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID.

Monitorare lo stato del job di importazione

  1. Per ottenere lo stato dell'operazione import, esegui questo comando:

    gcurl https://dataplex.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/operations/operation-OPERATION_ID

    Sostituisci OPERATION_ID con l'ID dell'operazione.

  2. Per ottenere lo stato del job di metadati, esegui questo comando:

    gcurl -X GET https://dataplex.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/metadataJobs/JOB_ID

Passaggi successivi