This document explains how to use the glossary import utility to perform bulk import of glossaries, categories, and terms from a Google Sheet into Knowledge Catalog (formerly Dataplex Universal Catalog).
The glossary_import utility parses and validates data from a Google Sheet,
converts it into a format compatible with the Knowledge Catalog
CreateMetadataJob API,
and uploads it to a Cloud Storage bucket before triggering the import job.
Before you begin
Before you import glossaries, complete the following prerequisites.
Set up the service account
To run the import utility using a Google Sheet, you must set up a service account with the necessary permissions to access the Google Sheets API and impersonate your user credentials:
Identify or create a service account.
Select an existing service account or create a new one in the project where you run the import utility. For more information, see Create service accounts. Note the service account email (for example,
SERVICE_ACCOUNT_NAME@PROJECT_ID.iam.gserviceaccount.com).Configure service account impersonation.
To run the import utility script locally, your user account must have permission to impersonate the service account. Grant your user account the Service Account Token Creator role (
roles/iam.serviceAccountTokenCreator) on the service account.For more information, see Manage access to service accounts.
Grant the service account
Editoraccess to the Google Sheet.Open the Google Sheet you want to use for the import process, click Share, and add the service account email as an
Editor. This permission enables the service account to read from or write data to your sheet.
Create a Cloud Storage bucket
Create a Cloud Storage bucket to serve as a staging area for import files.
Required roles
To ensure that the service account has the necessary permissions to import glossaries from a Google Sheet, ask your administrator to grant the following IAM roles to the service account:
- Dataplex Administrator (
roles/dataplex.admin) on the project - Dataplex Catalog Admin (
roles/dataplex.catalogAdmin) on the project - Dataplex Catalog Editor (
roles/dataplex.catalogEditor) on the project - Storage Object Admin (
roles/storage.objectAdmin) on the Cloud Storage bucket
For more information about granting roles, see Manage access to projects, folders, and organizations.
Your administrator might also be able to give the service account the required permissions through custom roles or other predefined roles.
Enable APIs
To import glossaries, enable the following APIs in your project:
Roles required to enable APIs
To enable APIs, you need the Service Usage Admin IAM
role (roles/serviceusage.serviceUsageAdmin), which
contains the serviceusage.services.enable permission. Learn how to grant
roles.
Set up the git repository
Clone the dataplex-labs repository:
git clone https://github.com/GoogleCloudPlatform/dataplex-labs.git cd dataplex-labs/dataplex-quickstart-labs/00-resources/scripts/python/business-glossary-import/dataplex-glossary/import
Install dependencies
Install the required Python dependencies:
pip3 install -r requirements.txt
If you encounter any issues with the package installation, set up a new Python development environment.
Authenticate and configure service account impersonation
Initialize the Google Cloud CLI and authenticate using Application Default Credentials (ADC) with service account impersonation:
# Set your service account email address SA_EMAIL="SERVICE_ACCOUNT_EMAIL" # Authenticate ADC using service account impersonation and required scopes gcloud auth application-default login \ --impersonate-service-account="${SA_EMAIL}" \ --scopes="https://www.googleapis.com/auth/spreadsheets"
Replace SERVICE_ACCOUNT_EMAIL with the service account email ID. For example:
SERVICE_ACCOUNT_NAME@PROJECT_ID.iam.gserviceaccount.com
Create and structure the Google Sheet for import
To successfully run a bulk import, you must create a new Google Sheet using a
precise column schema so that the import utility can successfully
parse and validate your metadata fields. Ensure that you have granted the
service account Editor access to the Google Sheet.
The first row of the sheet must contain these exact, case-sensitive schema headers:
| Field | Required or Optional | Description |
|---|---|---|
id |
Required | A unique identifier for the term or category within the glossary. |
parent |
Optional | The ID of the parent category. If empty, the item is a root-level child of the glossary. Note that the referenced parent ID must exist in the sheet and must belong to a category. |
display_name |
Required | The display name of the term or category. |
description |
Optional | A brief description of the term or category. |
overview |
Optional | A rich text description of the term or category (supports HTML tags). |
type |
Required | The row type. Valid values are TERM or CATEGORY. |
contact1_email |
Optional | Email address of the primary data steward for the term or category. |
contact1_name |
Optional | Name of the primary data steward for the term or category. |
contact2_email |
Optional | Email address of the secondary data steward for the term or category. |
contact2_name |
Optional | Name of the secondary data steward for the term or category. |
label1_key |
Optional | Key for the first assigned label. |
label1_value |
Optional | Value for the first assigned label. |
label2_key |
Optional | Key for the second assigned label. |
label2_value |
Optional | Value for the second assigned label. |
Import glossary from the Google Sheet
After setting up your environment and preparing your Google Sheet, run the
glossary_import.py script:
python3 bg_import/dataplex-glossary/glossary_import.py
The import utility prints the result of the CreateMetadataJob API execution
directly to your terminal. Verify that the job status indicates success.
You can review the execution logs in the logs/ directory in your local
execution path. These logs help you audit the transfer process and identify
skipped entries or formatting warnings.
What's next
- Learn how to manage a business glossary.
- Learn how to import glossaries using JSON files.
- Learn how to export glossaries to a Google Sheet.
- Learn more about metadata management.