This page describes how to view the data lineage generated by your Cloud Data Fusion pipelines with other data movement on Google Cloud, for discovery and governance purposes. You can view the lineage graphs for supported data sources on the Dataplex Universal Catalog page in the console, or use the Data Lineage API to retrieve complete data lineage records.
Plugins that support Dataplex Universal Catalog data lineage
Cloud Data Fusion and Dataplex Universal Catalog support asset-level lineage for the following plugins:
- Amazon S3
- BigQuery
- BigQuery Multi Table sink (version 6.9.1 and later)
- Spanner
- Cloud Storage
- Cloud SQL for MySQL
- Cloud SQL for PostgreSQL
- Dataplex Universal Catalog
- FTP
- Generic Database
- HTTP
- MSSQL/SQL Server
- Multiple Database Tables source (version 6.9.1 and later)
- MySQL
- Oracle
- PostgreSQL
- SAP OData
- SAP ODP
- SAP Table
For more information, see Cloud Data Fusion plugins.
Before you begin
To enable viewing Cloud Data Fusion lineage graphs on the Dataplex Universal Catalog page in the console, do the following:
- Create a data pipeline that uses only the supported plugins. 
- Enable the Data Lineage API in the project that contains your Cloud Data Fusion instance. 
- Grant the Data Lineage Events Producer role ( - roles/datalineage.producer) to the Cloud Data Fusion-managed service account, the Cloud Data Fusion API Service Agent. The process varies if your instance runs in an earlier version of Cloud Data Fusion and RBAC is enabled.- 6.10+ or no RBAC- If your Cloud Data Fusion instance uses version 6.10.0 or later, or your instance uses an earlier version and RBAC isn't enabled, follow these steps: - In the Google Cloud console, go to the IAM page. 
- Select the Include Google-provided role grants checkbox. 
- Select the Cloud Data Fusion API Service Agent service account and click Edit. 
- Click Add another role and select the Data Lineage Events Producer role. 
- Click Save. 
 - <6.10 with RBAC- If your Cloud Data Fusion instance uses a version earlier than 6.10.0 and RBAC is enabled, the service account doesn't appear in the list of principals on the IAM page. You must enter the service account name manually. - To grant the required role, follow these steps: - In the Google Cloud console, go to the IAM page. 
- Click Grant access. 
- In the New principals field, enter the Cloud Data Fusion API Service Agent service account. Use the following format: - datafusion-system@TENANT_PROJECT_ID.iam.gserviceaccount.com.- Replace - TENANT_PROJECT_IDwith the tenant ID for your instance. To view the tenant project ID, go to the Instances page and click the instance name for instance details.
- Select the Data Lineage Events Producer role. 
- Click Save. 
 
Enable Dataplex Universal Catalog data lineage in Cloud Data Fusion
For new instances in Cloud Data Fusion, Dataplex Universal Catalog data lineage is turned off by default. If you created the instance before January 27, 2024 with version 6.8.0 or later, it's turned on by default after completing the steps in Before you begin.
Enable Dataplex Universal Catalog data lineage when you create an instance
Console
To enable Dataplex Universal Catalog data lineage when you create an instance, follow these steps:
- Go to the Cloud Data Fusion Instances page and click Create an instance. 
- When you configure the instance, expand the Advanced options section and click Enable integration with Dataplex data lineage. For more information about creating instances, see Create a public instance. 
REST API
To enable Dataplex Universal Catalog data lineage when you create an instance,
set the optional dataplex_data_lineage_integration_enabled property to
true:
echo '{ "description": "CDAPinstance", "dataplex_data_lineage_integration_enabled": "true"}' | curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  --data @- \
  "https://datafusion.googleapis.com/v1/projects/PROJECT/locations/LOCATION/instances?instanceId=INSTANCE_NAME"
To turn it off, either set the property to false or omit the property, as lineage is turned off by default when you create a new instance.
Enable or disable Dataplex Universal Catalog data lineage in an existing instance
Console
To enable or disable Dataplex Universal Catalog data lineage in an existing instance in Cloud Data Fusion, follow these steps:
- View the instance details:
- In the Google Cloud console, go to the Cloud Data Fusion page. 
- Click Instances, and then click the instance's name to go to the Instance details page. 
 
- In the Dataplex data lineage integration field, click Edit.
- Enable or disable Dataplex Universal Catalog data lineage, and then click Save.
REST API
To enable Dataplex Universal Catalog data lineage in an existing instance in
Cloud Data Fusion, set the dataplex_data_lineage_integration_enabled
property to true and include the updateMask parameter value:
echo '{ "description": "CDAPinstance", "dataplex_data_lineage_integration_enabled": "true"}' | curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  --data @- \
  "https://datafusion.googleapis.com/v1/projects/PROJECT/locations/LOCATION/instances?instanceId=INSTANCE_NAME?updateMask=dataplex_data_lineage_integration_enabled"
To disable Dataplex Universal Catalog data lineage in an existing instance in
Cloud Data Fusion, set the dataplex_data_lineage_integration_enabled
property to false and include the updateMask parameter value:
echo '{ "description": "CDAPinstance", "dataplex_data_lineage_integration_enabled": "false"}' | curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  --data @- \
  "https://datafusion.googleapis.com/v1/projects/PROJECT/locations/LOCATION/instances?instanceId=INSTANCE_NAME?updateMask=dataplex_data_lineage_integration_enabled"
View data lineage graphs
To view lineage graphs for entities across all Google Cloud services, do the following:
- Go to your instance in Cloud Data Fusion and run a data pipeline that uses supported plugins. 
- View the lineage graphs on the Dataplex Universal Catalog page in the console and find the asset for which you want to view lineage information. 
Limitations
Viewing lineage in Dataplex Universal Catalog has the following limitations:
- The lineage in Dataplex Universal Catalog is only discoverable if there is a BigQuery entity connected to the supported plugins. For more information about when data lineage graphs are available, see About data lineage. 
- The Data Lineage API doesn't support customer-managed encryption keys (CMEK). 
- Cloud Data Fusion doesn't support this feature in - me-central1or- europe-west12locations.
- Review the data lineage considerations. 
What's next
- Learn more about data lineage.