Track data lineage with Dataplex Universal Catalog

This page explains how to track data lineage for your Looker (Google Cloud core) instance using Dataplex Universal Catalog.

Data lineage is the process of tracking how data flows through your systems. By integrating Looker (Google Cloud core) with Dataplex Universal Catalog, you can visualize the end-to-end journey of your data from its source in BigQuery through the Looker semantic layer (LookML views and Explores) to downstream consumption in dashboards and Looks.

This visibility helps data engineers and administrators perform impact analysis. For example, before dropping a column in a BigQuery table, you can check the lineage graph to see exactly which Looker dashboards would be broken by the change.

Before you begin

To use data lineage with Looker (Google Cloud core), you must meet the following prerequisites:

  1. Looker (Google Cloud core): Data lineage is supported for all edition types of Looker (Google Cloud core) instances. Looker (original) instances don't integrate with Dataplex Universal Catalog.
  2. Required Permissions: To view lineage graphs, you need the following IAM roles:
    • Looker Schema Viewer (roles/looker.schemaViewer) on the project that's hosting the Looker (Google Cloud core) instance
    • Dataplex Viewer (roles/dataplex.viewer) or equivalent permissions to view Dataplex Universal Catalog assets
    • Data Lineage Viewer (roles/datalineage.viewer) to read lineage data

Enable data lineage

To enable data lineage, complete each of the following steps:

  1. Enable the Universal Catalog integration for Looker (Google Cloud core): The integration between your Looker (Google Cloud core) instance and Dataplex Universal Catalog is enabled by default in the Google Cloud console. If the integration has been disabled, you must enable it again. See Enable the integration for instructions.
  2. Enable the Dataplex Lineage preview feature inside Looker: The Dataplex Lineage preview feature is disabled by default on the Preview Features page of the Admin panel within your Looker (Google Cloud core) instance.
  3. Enable the Data Lineage API: You must enable the Data Lineage API (datalineage.googleapis.com) on any Google Cloud projects that host your Looker (Google Cloud core) instance and your BigQuery data.

    Enable the Data Lineage API

  4. Enable service-level lineage ingestion: Ensure that the lineage and Looker (Google Cloud core) service-level integration is enabled. Service-level lineage adheres to the following default states:
    • To prevent future pricing implications, Looker (Google Cloud core) service-level lineage ingestion is disabled by default for projects that, at the preview release date of this feature, had the Data Lineage API enabled and hosted at least one Looker (Google Cloud core) instance.
    • Service-level lineage ingestion is enabled by default for Looker (Google Cloud core) instances created after the preview release date of this feature in projects with the Data Lineage API enabled.

To view the lineage configuration for a Google Cloud project, see the Get current configuration documentation. If the integration with Looker (Google Cloud core) is disabled, the command will return output similar to the following:

    {
    "name": "projects/123456789012/locations/global/config",
    "ingestion": {
      "rules": [
        {
          "integrationSelector": {
            "integration": "LOOKER_CORE"
          },
          "lineageEnablement": {
            "enabled": false
          }
        }
      ]
    },
    "etag": "Wb35wDxTTLd6Z+QAL+Yd4g=="
  }

The project ID in the response will reflect the ID in your request. The etag field is a checksum that is generated by the server and that is based on the current value of the configuration.

View data lineage

Once the integration is enabled and the initial sync is complete (which may take up to 24 hours), you can view lineage in the Dataplex Universal Catalog console.

  1. In the Google Cloud console, go to the Dataplex Universal Catalog page.

    Go to Dataplex Universal Catalog

  2. Click Search in the left navigation pane.
  3. Search for a BigQuery table or a Looker (Google Cloud core) asset (like a dashboard or an Explore).
    • You can use the Filters panel to filter by System > Looker.
  4. Click the name of the asset to open its details page.
  5. Click the Lineage tab.

The lineage graph displays the asset as a central node, with upstream sources to the left and downstream consumers to the right.

Interpret the lineage graph

The lineage graph consists of nodes and links:

  • Nodes: Represent data assets. Supported Looker (Google Cloud core) assets include the following:
    • Looker dashboard
    • Looker dashboard element (tile)
    • Looker Look
    • LookML Explore
    • LookML view
  • Links: Represent the flow of data. For example, a link from a BigQuery table to a LookML view indicates that the view selects data from that table.

Identify asset owners

To find out who owns a downstream asset that might be impacted by a change, follow these steps:

  1. In the lineage graph, click the node for the asset (for example, a Looker dashboard).
  2. An information panel opens on the right side of the screen.
  3. Look for the Aspects section to find the Owner (email address).

Filter the lineage list

When in the List view for lineage, you can filter entities by property name or value. For example, complex LookML models can generate large lineage graphs with many intermediate entities. To focus on business impact, you could filter by entity type by following these steps:

  1. In the Lineage tab, toggle to the List view.
  2. Locate the Filter options in the toolbar.
  3. In the Entity filter, enter Looker Dashboard and Looker Look to filter out intermediate types like LookML View and Looker Explore.

The entity list updates to show only the selected asset types, making it easier to identify user-facing content.

Limitations

The Looker (Google Cloud core) lineage integration has the following limitations during the preview release:

  • Data Sources: In the preview, lineage is supported only for BigQuery data sources.
  • Granularity: Lineage is provided at the object level (table, view, Explore, dashboard). Column-level lineage is not supported.
  • Latency: Lineage data is not real-time. The synchronization process typically takes four hours. However, synchronization may take up to eight hours, depending on the timing of Looker metadata exports and lineage data consumption. Changes made in Looker or BigQuery may take some time to appear in the lineage graph.
  • Complex SQL: LookML that's defined with complex custom SQL (for example, Liquid templates, derived tables with complex joins) may not be fully parsed, potentially resulting in disconnected nodes.

Pricing

During the preview release, there is no charge for data lineage features that are used with this integration.

When data lineage becomes generally available, pricing will be incurred. To prevent future pricing implications, Looker (Google Cloud core) lineage ingestion is disabled by default for projects that, at the preview release date of this feature, had the Data Lineage API enabled and hosted at least one Looker (Google Cloud core) instance.

For more information, see the Dataplex Universal Catalog pricing page.

What's next