View data lineage to understand the relationships between your project's resources and the processes that created them. These relationships show how data assets, such as tables and datasets, are transformed by processes like queries and pipelines. This guide describes how to access lineage graphs in Dataplex Universal Catalog, BigQuery, and Vertex AI.
You can view data lineage details in the Google Cloud console or retrieve them by using the Data Lineage API.
Roles and permissions
Data lineage tracks lineage information automatically when you enable the Data Lineage API. You don't need any administrator or editor roles to capture lineage for your data assets.
To view data lineage, you need specific Identity and Access Management (IAM) permissions. Lineage information is captured across projects, so you need permissions in multiple projects.
When viewing lineage in Dataplex Universal Catalog, BigQuery, or Vertex AI: you need permissions to view lineage information in the project where you are viewing it.
When viewing lineage that was recorded in other projects: you need permissions to view lineage information in those projects where it was recorded.
To get the permissions that you need to view data lineage, ask your administrator to grant you the following IAM roles:
-
Data Lineage Viewer (
roles/datalineage.viewer) on the project where lineage is recorded, and the project where lineage is viewed -
View BigQuery table details:
BigQuery Data Viewer (
roles/bigquery.dataViewer) on the table's storage project -
View BigQuery job details:
BigQuery Resource Viewer (
roles/bigquery.resourceViewer) on the job's compute project -
View details for other cataloged assets:
Dataplex Catalog Viewer (
roles/dataplex.catalogViewer) on the project where catalog entries are stored
For more information about granting roles, see Manage access to projects, folders, and organizations.
These predefined roles contain the permissions required to view data lineage. To see the exact permissions that are required, expand the Required permissions section:
Required permissions
The following permissions are required to view data lineage:
-
View BigQuery table details:
bigquery.tables.get- the table's storage project -
View BigQuery job details:
bigquery.jobs.get- the job's compute project
You might also be able to get these permissions with custom roles or other predefined roles.
Data lineage tracks lineage information automatically when you enable the Data Lineage API. You don't need any administrator or editor roles to capture lineage for your data assets.
Types of data lineage views
You can view lineage information as a graph or a list. The lineage graph displays table-level lineage by default. For BigQuery jobs, you can view column-level lineage in both graph and list views.
The following view types are available:
Graph view: displays lineage as an interactive graph, letting you explore relationships between data assets and columns by expanding nodes.
List view: displays lineage in a tabular format, providing simplified and detailed representations of table-level and column-level lineage. You can customize columns and export lineage data from this view.
The key elements in the graph are described as follows:
Nodes: represent the data entities. In the table-level view, a node shows the table name and its columns. In the column-level view, each node represents a specific table and its columns that have lineage.
Edges: the lines that connect nodes and represent the processes that occur between them. Edges can feature icons or labels to provide more information about the transformation:
- Icons: In table-level view, icons appear on edges to represent the transformation process. When you manually explore the graph, icons on edges represent the source system of the process (for example, BigQuery or Vertex AI). If multiple processes are involved, a 'multiple processes' icon is displayed. If the process source system is unknown, a gear icon is used. When you apply filters, a gear icon is used for all processes.
- Labels: In column-level view, edges are labeled to describe the
type of dependency between columns, such as
Exact copyorOther.
Enable data lineage
Enable data lineage to begin automatically tracking lineage information for supported systems. By default, enabling the API activates lineage tracking for most supported services. To control Dataproc lineage ingestion, see Control lineage ingestion for a service.
You must enable the Data Lineage API in both the project where you view lineage and the projects where lineage is recorded. For more information, see Project types.
- To capture lineage information, complete the following steps:
-
In the Google Cloud console, on the Project selector page, select the project where you want to record lineage.
Enable the Data Lineage API.
- Repeat the previous steps for each project where you want to record lineage.
-
In the project where you view lineage, enable the Data Lineage API and the Dataplex API.
Control lineage ingestion for a service
After you enable the Data Lineage API, the service starts automatic lineage tracking for most supported services. You can then selectively enable or disable lineage ingestion for specific integrations at the project, folder, or organization level. During preview, this feature only supports configuring ingestion for Dataproc. If you disable lineage ingestion for Dataproc, it also disables lineage ingestion for Dataproc Serverless for Apache Spark.
The configuration is hierarchical. The most specific configuration takes precedence. For example, a project-level configuration overrides a folder-level configuration. If no configuration is set, the service's default behavior is used. For Dataproc, the default is Enabled.
Any changes to the configuration might take up to 24 hours to propagate, but usually become effective within two hours.
For Dataproc and Dataproc Serverless for Apache Spark, lineage data is only sent if lineage is also enabled in Dataproc. For more information, see Dataproc Spark lineage and Dataproc Serverless for Apache Spark data lineage.
For more information about controlling lineage ingestion including how the configuration is applied hierarchically, see Control lineage ingestion.
Prerequisites
To control lineage ingestion, you must use the Data Lineage API. Ensure you have a client project configured for billing and quota, as the Data Lineage API is a client-based API.
Enable the
datalineage.googleapis.comAPI in your client project. For more information, see Enable data lineage.Set the client project. For the following examples, use the
X-Goog-User-Projectheader. For more information, see System parameters.
Get current configuration
To view the current lineage configuration, use the
projects.locations.config.get method. You can retrieve the configuration for
a project, folder, or organization.
The following example shows how to get the configuration for a project:
curl -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ -H "X-Goog-User-Project: CLIENT_PROJECT_ID" \ -X GET \ "https://datalineage.googleapis.com/v1/projects/PROJECT_ID/locations/global/config"
Replace these values:
- CLIENT_PROJECT_ID: The ID of your client project used for billing or quotas.
- PROJECT_ID: The ID of the project whose configuration you want to view.
To get the configuration for a folder or organization, replace
projects/PROJECT_ID with folders/FOLDER_ID or
organizations/ORGANIZATION_ID.
The command returns one of the following output:
- If no configuration is set, you get an output with an empty
ingestionobject:{ "name": "projects/123456789012/locations/global/config", "ingestion": {} }
In this case, Dataproc lineage ingestion uses the default setting, which is
enabled. - If Dataproc lineage ingestion is explicitly enabled, you get
the following output:
{ "name": "projects/123456789012/locations/global/config", "ingestion": { "rules": [ { "integrationSelector": { "integration": "DATAPROC" }, "lineageEnablement": { "enabled": true } } ] }, "etag": "Wb35wDxTTLd6Z+QAL+Yd4g==" }
- If Dataproc lineage ingestion is disabled, you get the
following output:
{ "name": "projects/123456789012/locations/global/config", "ingestion": { "rules": [ { "integrationSelector": { "integration": "DATAPROC" }, "lineageEnablement": { "enabled": false } } ] }, "etag": "Wb35wDxTTLd6Z+QAL+Yd4g==" }
The etag field in the response is a checksum generated by the server based on
the current value of the configuration. When updating a configuration using
the patch method, you can include the etag value returned from a
recent get request in the request body. If you provide the etag,
Dataplex Universal Catalog uses it to verify that the configuration hasn't changed
since your last read request. If there's a mismatch, the update request
fails. This prevents you from unintentionally overwriting configurations made by
other users in read-modify-write scenarios. If you don't provide an etag
in your patch request, Dataplex Universal Catalog overwrites the configuration
unconditionally.
Disable lineage ingestion for a service
To disable lineage ingestion for a specific service,
use the projects.locations.config.patch method with an ingestion rule that
sets lineageEnablement.enabled to false for the specific integration.
To prevent unintentionally overwriting configurations made by other users in
read-modify-write scenarios, you can include the etag field in the request
body. For more information, see
Get current configuration.
curl -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ -H "X-Goog-User-Project: CLIENT_PROJECT_ID" \ -X PATCH \ "https://datalineage.googleapis.com/v1/projects/PROJECT_ID/locations/global/config" \ --data-binary @- << EOF { "ingestion": { "rules": [{ "integrationSelector": { "integration": "DATAPROC" }, "lineageEnablement": { "enabled": false } }] }, "etag": "ETAG" } EOF
Replace the following:
- CLIENT_PROJECT_ID: The ID of your client project used for billing or quotas.
- PROJECT_ID: The ID of the project whose configuration you want to update.
- ETAG: The
etagvalue returned from a recentgetrequest.
To disable lineage ingestion of a service for a folder or organization, replace
projects/PROJECT_ID with folders/FOLDER_ID or
organizations/ORGANIZATION_ID.
Enable lineage ingestion for a service
To enable lineage ingestion for a specific service,
use the projects.locations.config.patch method with an ingestion rule that
sets lineageEnablement.enabled to true for the specific integration.
To prevent unintentionally overwriting configurations made by other users in
read-modify-write scenarios, you can include the etag field in the request
body. For more information, see
Get current configuration.
curl -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ -H "X-Goog-User-Project: CLIENT_PROJECT_ID" \ -X PATCH \ "https://datalineage.googleapis.com/v1/projects/PROJECT_ID/locations/global/config" \ --data-binary @- << EOF { "ingestion": { "rules": [{ "integrationSelector": { "integration": "DATAPROC" }, "lineageEnablement": { "enabled": true } }] }, "etag": "ETAG" } EOF
Replace the following:
- CLIENT_PROJECT_ID: The ID of your client project used for billing or quotas.
- PROJECT_ID: The ID of the project whose configuration you want to update.
- ETAG: The
etagvalue returned from a recentgetrequest.
To enable lineage ingestion of a service for a folder or organization, replace
projects/PROJECT_ID with folders/FOLDER_ID or
organizations/ORGANIZATION_ID.
View lineage in Dataplex Universal Catalog
You can view data lineage information in the Dataplex Universal Catalog web interface.
To view the lineage, follow these instructions:
In the Google Cloud console, go to the Dataplex Universal Catalog Search page.
Select Dataplex Universal Catalog as the search mode.
Search for the entry you want to view, and then click it. For more information, see Search for resources in Dataplex Universal Catalog.
Click the Lineage tab.
The default Graph view opens, showing table-level lineage across systems and regions. For more information, see Lineage graph view.
To manually explore the lineage graph, click Expand next to a node to load five more nodes at a time.
For more information, see Manually explore the lineage graph.
Click a node in the Graph view.
The Details panel opens with information about the asset, such as fully qualified name and type. For more information, see Node details.
Click an edge with a process icon in the Graph view.
The Query panel opens. For more information, see Inspect transformation logic and Audit and history of runs.
To inspect transformation logic, click the Details tab.
To see audit and history of runs, click the Runs tab.
In the Lineage explorer panel, select filter criteria—for example, Direction, Dependency type, or Time range—and then click Apply.
This opens a focused view within a specific region (Preview). This view automatically expands the graph up to three levels of nodes. For more information, see Apply filters for a focused lineage view.
In the focused Graph view, select a node, and then in the node's details panel, click Visualize Path to visualize the lineage path from the selected node back to the root entry (only in focused view).
For more information, see Lineage path visualization.
To view column-level lineage (only for BigQuery jobs), do one of the following:
- In a focused Graph view, click the column icon on a table.
Column icon - In the Lineage explorer panel, filter by column name, and click Apply.
For more information, see Column-level lineage.
- In a focused Graph view, click the column icon on a table.
Click Reset.
This action removes all applied filters and takes you to the beginning of the graph view.
Click List to switch to the list view.
The List view offers simplified and detailed tabular representations of lineage for both table-level and column-level lineage, synchronized with the Graph view. By default, simplified list view is displayed, and you can toggle to detailed list view for analyzing individual source-target relationships. You can configure which columns are displayed and export lineage data. For more information, see Lineage list view.
View lineage in BigQuery
You can view data lineage information in the BigQuery web interface.
To view the lineage, follow these instructions:
- In the Google Cloud console, go to the BigQuery page.
Open the table for which you want to see the data lineage.
Click the Lineage tab.
The default Graph view opens, showing table-level lineage across systems and regions. For more information, see Lineage graph view.
To manually explore the lineage graph, click Expand next to a node to load five more nodes at a time.
For more information, see Manually explore the lineage graph.
Click a node in the Graph view.
The Details panel opens with information about the asset, such as fully qualified name and type. For more information, see Node details.
Click an edge with a process icon in the Graph view.
The Query panel opens. For more information, see Inspect transformation logic and Audit and history of runs.
To inspect transformation logic, click the Details tab.
To see audit and history of runs, click the Runs tab.
In the Lineage explorer panel, select filter criteria—for example, Direction, Dependency type, or Time range—and then click Apply.
This opens a focused view within a specific region (Preview). This view automatically expands the graph up to three levels of nodes. For more information, see Apply filters for a focused lineage view.
In the focused Graph view, select a node, and then in the node's details panel, click Visualize Path to visualize the lineage path from the selected node back to the root entry (only in focused view).
For more information, see Lineage path visualization.
To view column-level lineage (only for BigQuery jobs), do one of the following:
- In a focused Graph view, click the column icon on a table.
Column icon - In the Lineage explorer panel, filter by column name, and click Apply.
For more information, see Column-level lineage.
- In a focused Graph view, click the column icon on a table.
Click Reset.
This action removes all applied filters and takes you to the beginning of the graph view.
Click List to switch to the list view.
The List view offers simplified and detailed tabular representations of lineage for both table-level and column-level lineage, synchronized with the Graph view. By default, simplified list view is displayed, and you can toggle to detailed list view for analyzing individual source-target relationships. You can configure which columns are displayed and export lineage data. For more information, see Lineage list view.
View lineage in Vertex AI
Systems like Vertex AI Pipelines generate lineage data for Vertex AI models and datasets. You can view data lineage information in the Vertex AI web interface.
View lineage for a managed dataset in Vertex AI
To view the lineage for a dataset, follow these instructions:
- In the Google Cloud console, go to the Datasets page.
Click the dataset for which you want to see the data lineage.
Click the Lineage tab.
The default Graph view opens, showing table-level lineage across systems and regions. For more information, see Lineage graph view.
To manually explore the lineage graph, click Expand next to a node to load five more nodes at a time.
For more information, see Manually explore the lineage graph.
Click a node in the Graph view.
The Details panel opens with information about the asset, such as fully qualified name and type. For more information, see Node details.
Click an edge with a process icon in the Graph view.
The Query panel opens. For more information, see Inspect transformation logic and Audit and history of runs.
To inspect transformation logic, click the Details tab.
To see audit and history of runs, click the Runs tab.
In the Lineage explorer panel, select filter criteria—for example, Direction, Dependency type, or Time range—and then click Apply.
This opens a focused view within a specific region (Preview). This view automatically expands the graph up to three levels of nodes. For more information, see Apply filters for a focused lineage view.
In the focused Graph view, select a node, and then in the node's details panel, click Visualize Path to visualize the lineage path from the selected node back to the root entry (only in focused view).
For more information, see Lineage path visualization.
To view column-level lineage (only for BigQuery jobs), do one of the following:
- In a focused Graph view, click the column icon on a table.
Column icon - In the Lineage explorer panel, filter by column name, and click Apply.
For more information, see Column-level lineage.
- In a focused Graph view, click the column icon on a table.
Click Reset.
This action removes all applied filters and takes you to the beginning of the graph view.
Click List to switch to the list view.
The List view offers simplified and detailed tabular representations of lineage for both table-level and column-level lineage, synchronized with the Graph view. By default, simplified list view is displayed, and you can toggle to detailed list view for analyzing individual source-target relationships. You can configure which columns are displayed and export lineage data. For more information, see Lineage list view.
View lineage for a model in Vertex AI
To view the lineage for a model, follow these instructions:
In the Google Cloud console, go to the Model Registry page.
Click the model for which you want to see the data lineage.
Click the Lineage tab.
The default Graph view opens, showing table-level lineage across systems and regions. For more information, see Lineage graph view.
To manually explore the lineage graph, click Expand next to a node to load five more nodes at a time.
For more information, see Manually explore the lineage graph.
Click a node in the Graph view.
The Details panel opens with information about the asset, such as fully qualified name and type. For more information, see Node details.
Click an edge with a process icon in the Graph view.
The Query panel opens. For more information, see Inspect transformation logic and Audit and history of runs.
To inspect transformation logic, click the Details tab.
To see audit and history of runs, click the Runs tab.
In the Lineage explorer panel, select filter criteria—for example, Direction, Dependency type, or Time range—and then click Apply.
This opens a focused view within a specific region (Preview). This view automatically expands the graph up to three levels of nodes. For more information, see Apply filters for a focused lineage view.
In the focused Graph view, select a node, and then in the node's details panel, click Visualize Path to visualize the lineage path from the selected node back to the root entry (only in focused view).
For more information, see Lineage path visualization.
To view column-level lineage (only for BigQuery jobs), do one of the following:
- In a focused Graph view, click the column icon on a table.
Column icon - In the Lineage explorer panel, filter by column name, and click Apply.
For more information, see Column-level lineage.
- In a focused Graph view, click the column icon on a table.
Click Reset.
This action removes all applied filters and takes you to the beginning of the graph view.
Click List to switch to the list view.
The List view offers simplified and detailed tabular representations of lineage for both table-level and column-level lineage, synchronized with the Graph view. By default, simplified list view is displayed, and you can toggle to detailed list view for analyzing individual source-target relationships. You can configure which columns are displayed and export lineage data. For more information, see Lineage list view.
What's next
Track data lineage for a BigQuery table's copy and query jobs.
Learn about data lineage information model.
Learn about data lineage considerations.
Learn about data lineage audit logging.
Learn how to troubleshoot data lineage.
Learn how to integrate with OpenLineage.