This document describes the concepts, methods, and use cases for searching data lineage across multiple geographical regions in Knowledge Catalog (formerly Dataplex Universal Catalog).
Data lineage in Knowledge Catalog is a regionalized service. Lineage data including links, processes, and events is recorded and stored within the specific geographical location where the underlying data transformation or data movement occurred.
However, enterprise data pipelines frequently span multiple Google Cloud
projects and regions (for example, a BigQuery table in us-central1
copying data to a storage bucket in europe-west1). To trace data assets
comprehensively across these boundaries, you must perform a multi-region
lineage search.
Knowledge Catalog provides two methods to discover and aggregate cross-regional lineage graphs:
- The server-side automation method that uses the
searchLineageStreamingAPI (Preview)—Recommended - The client-side fan-out method that uses the
searchLinksAPI
Core concepts
To understand multi-region lineage discovery, it helps to understand how the system handles graph traversal:
Root criteria: The starting point of your lineage search, defined by one or more asset names (such as a BigQuery table or a Pub/Sub topic) or fine-grained column fields.
Direction: The orientation of the graph traversal relative to the root criteria. You can search upstream (to see where your data came from) or downstream (to see where your data is going).
Breadth-first search: The architectural mechanism used to find connected nodes. The search traverses the lineage graph layer by layer, accurately calculating the execution depth of each connected asset across regional boundaries.
Comparison of search methods
While both methods let you to piece together a cross-regional view of your data, they handle the heavy lifting differently:
| Feature | Server-side automation searchLineageStreaming API |
Client-side fan-out searchLinks API |
|---|---|---|
| Execution model | Server-side automation: The Google Cloud routing engine traverses multiple regions natively. | Client-side orchestration: Your application script must manually loop and manage requests. |
| Request overhead | Single API request: A single HTTP POST call starts the
multi-region search. |
Multiple API requests: Requires a separate HTTP call for every region and every graph layer. |
| Response handling | Real-time stream: Results are pushed to the client as they are found, preventing timeouts. | Static payloads: Individual JSON arrays must be received, collected, and merged manually. |
| Deep graphs (greater than 2 layers) | Handles deep, nested lineage graphs automatically up to 100 levels. | Suffers from the N+1 query problem; requires iterative, slow round-trips from the client. |
Choose the right method for your use case
Review the following scenarios to determine which multi-region search method fits your workload.
Choose the streaming API method for the following use cases:
Trace deep or complex graphs: Your data moves through multiple intermediate tables, buckets, or pipelines across different regions, requiring multi-level traversal (
maxDepthgreater than 2).Track column-level lineage: You want to track fields across regions or leverage wildcard (
*) searches to pull all column dependencies at once.Maintain lightweight code: You prefer to make a single API call and let Google Cloud handle the routing, deduplication, and graph assembly.
Require pipeline metadata: You want to optionally retrieve structural details about the processes running your pipelines in the same request payload.
Choose the client-side fan-out method for the following scenarios:
You only trace shallow, single-hop lineage: Your lineage graph isn't complex, and you only need to look up direct parent or child links (
maxDepthequals 1) across a small, fixed number of known regions.You are working within strict legacy systems: You have an existing data-governance application built heavily around the standard
SearchLinksendpoint and want to maintain structural backward compatibility without implementing streaming response consumers.
What's next
Learn how to search multi-region lineage using server-side automation.
Learn how to search multi-region lineage using client-side fan-out.