This document provides instructions for migrating in a single step from the preview version of business glossary, which supported Data Catalog metadata, to the generally available version of business glossary in Dataplex Universal Catalog. Migrating to the generally available version lets you use the enhanced capabilities and deeper integration with Dataplex Universal Catalog metadata, offering improved stability, new features, and full production support. This process automatically updates your glossaries to support Dataplex Universal Catalog metadata.
Before you begin
Install gcloud or python packages. Authenticate your user account and the Application Default Credentials (ADC) that the Python libraries use. Run the following commands and follow the browser-based prompts:
gcloud init gcloud auth login gcloud auth application-default loginEnable the following APIs:
Create one or several Cloud Storage buckets in any of your projects. The buckets will be used as a temporary location for the import files. The more buckets you provide, the faster the import is. Grant the Storage Admin IAM role to the service account running the migration:
service-MIGRATION_PROJECT_ID@gcp-sa-dataplex.iam.gserviceaccount.com
Replace
MIGRATION_PROJECT_IDwith the project from which you are migrating the glossaries.Set up the repository:
Clone the repository:
git clone https://github.com/GoogleCloudPlatform/dataplex-labs.git cd dataplex-labs/dataplex-quickstart-labs/00-resources/scripts/python/business-glossary-importInstall the required packages:
pip3 install -r requirements.txt cd migration
Required roles
To get the permissions that you need to migrate glossaries from Data Catalog to Dataplex Universal Catalog, ask your administrator to grant you the following IAM roles:
-
Data Catalog Glossary Owner (
roles/datacatalog.glossaryOwner) on your project -
Dataplex Administrator (
roles/dataplex.admin) on your project
For more information about granting roles, see Manage access to projects, folders, and organizations.
These predefined roles contain the permissions required to migrate glossaries from Data Catalog to Dataplex Universal Catalog. To see the exact permissions that are required, expand the Required permissions section:
Required permissions
The following permissions are required to migrate glossaries from Data Catalog to Dataplex Universal Catalog:
-
on the project from which you are migrating glossariesdatacatalog.glossaries.get -
on the project from which you are migrating glossariesdatacatalog.glossaries.list -
on the project in which glossaries are created in Dataplex Universal Catalogdataplex.glossaries.create -
on the project where glossaries will be updated in Dataplex Universal Catalogdataplex.glossaries.update
You might also be able to get these permissions with custom roles or other predefined roles.
For more information about Dataplex Universal Catalog Identity and Access Management (IAM), see Access control with IAM.
Run the migration script
python3 run.py --project=MIGRATION_PROJECT_ID --user-project=USER_PROJECT_ID --buckets=BUCKET1,BUCKET2
Replace the following:
USER_PROJECT_ID: the project ID of the project to be migrated.The
MIGRATION_PROJECT_IDrefers to the source project containing Data Catalog glossaries that you want to export. TheUSER_PROJECT_IDis the project used for billing and quota attribution for the API calls generated by the script.BUCKET1andBUCKET2: the Cloud Storage bucket IDs to be used for the import.You can provide one or more buckets. For the bucket arguments, provide a comma-separated list of bucket names without spaces (for example,
--buckets=bucket-one,bucket-two). A one-to-one mapping among buckets and glossaries is not required; the script runs the import jobs in parallel, speeding up the migration.
If permission issues prevent the script from automatically discovering your
organization IDs, use the --orgIds flag to specify the organizations that the
script can use to search for data assets linked to glossary terms.
Scope glossaries in migration
To migrate only specific glossaries, define their scope by providing their respective URLs.
python3 run.py --project=MIGRATION_PROJECT_ID --user-project=USER_PROJECT_ID --buckets=BUCKET1,BUCKET2 --glossaries="GLOSSARY_URL1","GLOSSARY_URL2"
Replace GLOSSARY_URL1 (and GLOSSARY_URL2)
with the URLs of the glossaries you are migrating. You can provide one or more
glossary URLs.
When the migration runs, the number of import jobs can be less than the number of exported glossaries. This happens when empty glossaries that don't require a background import job are created directly.
Resume migration for import job failures
The presence of files after the migration indicates that some import jobs have failed. To resume the migration, run the following command:
python3 run.py --project=MIGRATION_PROJECT_ID --user-project=USER_PROJECT_ID --buckets=BUCKET1,BUCKET2 --resume-import
If you encounter failures, run the resume command again. The script processes
only files that were not successfully imported and deleted.
The script enforces dependency checks for entry links and inter-glossary links. An entry link file is imported only if its parent glossary was successfully imported. Similarly, a link between terms is imported only if all referenced terms have been successfully imported.
Troubleshoot
This section provides solutions to common errors.
Permission Denied / 403 Error: Ensure the user or service account has the Dataplex Universal Catalog Editor role on the destination project and the Data Catalog Viewer role on the source project.
ModuleNotFoundError: Ensure you have activated your Python virtual environment and installed the required packages using
pip3 install -r requirements.txt.TimeoutError / ssl.SSLError: These network-level errors might be caused by firewalls, proxies, or slow connections. The script has a 5-minute timeout - persistent issues might require checking your local network configuration.
Method not found (Cannot fetch entries): This error often indicates that your user project is not allow-listed to call the API, preventing the retrieval of necessary entries.