As of April 10, 2026, Dataplex Universal Catalog is now called Knowledge Catalog. The API, client library, CLI, and IAM names remain unchanged. For more information, see Introducing the Google Cloud Knowledge Catalog.

About multi-region lineage search

When you manage data across a complex organization, understanding your data lineage is essential for good data governance and effective cloud data management. This guide explains how to use multi-region search in Knowledge Catalog (formerly Dataplex Universal Catalog) to track your data across geographic boundaries.

By default, data lineage in Knowledge Catalog is a regional service. Whenever your data moves or transforms, the resulting lineage data such as links, processes, and events, is stored in the specific region where that action took place.

However, real-world data pipelines frequently span multiple Google Cloud projects and regions. For example, you might have a BigQuery table in us-central1 that copies data to a storage bucket in europe-west1. To trace your data assets across these boundaries and build complete lineage graphs, you need to perform a multi-region search.

Knowledge Catalog gives you two ways to discover and connect these cross-regional lineage graphs:

The server-side automation method that uses the searchLineageStreaming API (Preview)—Recommended
The client-side fan-out method that uses the searchLinks API

Core concepts of multi-region lineage search

To understand multi-region lineage discovery, it helps to understand how the system handles graph traversal:

Root criteria: The starting point of your lineage search, defined by one or more asset names (such as a BigQuery table or a Pub/Sub topic) or fine-grained column fields.
Direction: The orientation of the graph traversal relative to the root criteria. You can search upstream (to see where your data came from) or downstream (to see where your data is going).
Breadth-first search: The architectural mechanism used to find connected nodes. The search traverses the lineage graph layer by layer, accurately calculating the execution depth of each connected asset across regional boundaries.

How do the multi-region search methods compare?

While both methods let you to piece together a cross-regional view of your data, they handle the heavy lifting differently:

Feature	Server-side automation searchLineageStreaming API	Client-side fan-out searchLinks API
Execution model	Server-side automation: The Google Cloud routing engine traverses multiple regions natively.	Client-side orchestration: Your application script must manually loop and manage requests.
Request overhead	Single API request: A single HTTP `POST` call starts the multi-region search.	Multiple API requests: Requires a separate HTTP call for every region and every graph layer.
Response handling	Real-time stream: Results are pushed to the client as they are found, preventing timeouts.	Static payloads: Individual JSON arrays must be received, collected, and merged manually.
Deep graphs (greater than 2 layers)	Handles deep, nested lineage graphs automatically up to 100 levels.	Suffers from the N+1 query problem; requires iterative, slow round-trips from the client.

Choose the right multi-region search method

Review the following scenarios to determine which multi-region search method fits your workload.

Choose the streaming API method for the following use cases:

Trace deep or complex graphs: Your data moves through multiple intermediate tables, buckets, or pipelines across different regions, requiring multi-level traversal (maxDepth greater than 2).
Track column-level lineage: You want to track fields across regions or leverage wildcard (*) searches to pull all column dependencies at once.
Maintain lightweight code: You prefer to make a single API call and let Google Cloud handle the routing, deduplication, and graph assembly.
Require pipeline metadata: You want to optionally retrieve structural details about the processes running your pipelines in the same request payload.

Choose the client-side fan-out method for the following scenarios:

You only trace shallow, single-hop lineage: Your lineage graph isn't complex, and you only need to look up direct parent or child links (maxDepth equals 1) across a small, fixed number of known regions.
You are working within strict legacy systems: You have an existing data-governance application built heavily around the standard SearchLinks endpoint and want to maintain structural backward compatibility without implementing streaming response consumers.

What's next

Learn how to search multi-region lineage using server-side automation.
Learn how to search multi-region lineage using client-side fan-out.

About multi-region lineage search Stay organized with collections Save and categorize content based on your preferences.

Core concepts of multi-region lineage search

How do the multi-region search methods compare?

Choose the right multi-region search method

What's next

About multi-region lineage search