This document describes how to look up multi-level, cross-regional data
lineage by using the searchLineageStreaming API.
The
searchLineageStreaming
API performs a breadth-first search in a specified direction (upstream or
downstream) starting from a defined set of root entities, and returns a unified
lineage graph as a real-time streaming response.
For more information, see About multi-region lineage search.
Key capabilities
The searchLineageStreaming API includes the following capabilities:
Breadth-first search: Traverses the lineage graph layer by layer, accurately calculating the depth of each connected asset.
Streaming response: Returns subgraphs and lineage links as they are discovered by the backend system. This is highly efficient for broad or deep lineage graphs and prevents request timeouts.
Multi-location and multi-project traversal: Although you specify only one billing project in the request path, the API automatically discovers and traverses lineage links across multiple Google Cloud projects and geographical locations, provided you have the required permissions.
Fine-grained column-level lineage: Supports searching for column-level dependencies between assets.
Wildcard lookups: Lets you to retrieve all column-level lineage for a specific entity by suffixing the fully qualified name (FQN) with
*.Pipeline insights: Optionally retrieves metadata about the transformation pipelines (processes) that created the lineage links.
Before you begin
Before you make requests to the API, ensure that you have met the following security and environmental prerequisites:
Required roles
To get the permissions that
you need to search for data lineage links,
ask your administrator to grant you the
Data Lineage Viewer (roles/datalineage.viewer) IAM role on the projects where the lineage links and processes are stored.
For more information about granting roles, see Manage access to projects, folders, and organizations.
This predefined role contains the permissions required to search for data lineage links. To see the exact permissions that are required, expand the Required permissions section:
Required permissions
The following permissions are required to search for data lineage links:
-
Search entity-level lineage:
datalineage.events.geton the project where the link is stored -
Search column-level lineage:
datalineage.events.getFieldson the project where the link is stored -
Retrieve full pipeline process details:
datalineage.processes.geton the project where the process is stored
You might also be able to get these permissions with custom roles or other predefined roles.
Resource scoping
When you configure your API request, you must distinguish between the resource used for administrative billing and the actual locations scanned by the API:
Billing parent path: The
parentpath in the URL request must use the formatprojects/project/locations/location. This specific project-location pair is used exclusively to evaluate billing quotas and API rate limits.Target locations: Explicitly define the regions you want the backend to scan in the
locationsarray inside the request body.
Authentication setup
Initialize an environment variable with a Google Cloud access token to
authenticate your curl commands:
export ACCESS_TOKEN=$(gcloud auth print-access-token)
Usage examples
The following examples use the endpoint datalineage.googleapis.com.
Search multi-level, multi-project lineage
To execute a deep lineage search that traverses across multiple depths of the graph and scans across distinct Google Cloud projects, define the following variables:
Set
limits.maxDepthto your target traversal depth (accepts values from1to100).Populate the
locationsarray with the target regions you want the backend to cross-reference (for example,["us", "us-east1"]).
For example:
curl -H "Authorization: Bearer ${ACCESS_TOKEN}" \
-H "Content-Type: application/json" \
-X POST https://datalineage.googleapis.com/v1/projects/my-billing-project/locations/us:searchLineageStreaming \
--data '{
"parent": "projects/my-billing-project/locations/us",
"locations": ["us", "us-east1", "us-central1"],
"rootCriteria": {
"entities": {
"entities": [{
"fullyQualifiedName": "bigquery:project-prod.dataset.source_table"
}]
}
},
"direction": "DOWNSTREAM",
"limits": {
"maxDepth": 10,
"maxResults": 5000
}
}'
Search multiple geographical locations
You can limit or expand your lineage graph scan by modifying the geographical
regions passed inside the locations repeated array field.
For example:
curl -H "Authorization: Bearer ${ACCESS_TOKEN}" \
-H "Content-Type: application/json" \
-X POST https://datalineage.googleapis.com/v1/projects/my-billing-project/locations/us:searchLineageStreaming \
--data '{
"parent": "projects/my-billing-project/locations/us",
"locations": ["us", "europe-west1", "asia-south2"],
"rootCriteria": {
"entities": {
"entities": [{
"fullyQualifiedName": "bigquery:my-project.dataset.global_table"
}]
}
},
"direction": "DOWNSTREAM"
}'
Retrieve process names for lineage links
By default, the API leaves process information omitted (maxProcessPerLink
defaults to 0). To retrieve the resource names of the pipelines that created
your data links, configure limits.maxProcessPerLink to a non-zero positive
integer.
For example:
curl -H "Authorization: Bearer ${ACCESS_TOKEN}" \
-H "Content-Type: application/json" \
-X POST https://datalineage.googleapis.com/v1/projects/my-billing-project/locations/us:searchLineageStreaming \
--data '{
"parent": "projects/my-billing-project/locations/us",
"locations": ["us"],
"rootCriteria": {
"entities": {
"entities": [{
"fullyQualifiedName": "bigquery:my-project.dataset.target_table"
}]
}
},
"direction": "UPSTREAM",
"limits": {
"maxProcessPerLink": 5
}
}'
Response behavior: The resulting stream populates the links[].processes field
with process messages containing only their absolute system resource name
(such as projects/my-project/locations/us/processes/my-process).
Retrieve full process details using a FieldMask
If you need full structural metadata about a pipeline (such as its displayName,
system attributes, or execution origin) instead of just its resource name,
you must use an API FieldMask:
Provide a non-zero value to
limits.maxProcessPerLink.Append a
fieldsquery parameter to the URL path, specifyinglinks.processes.processalong with other required fields.
For example:
curl -H "Authorization: Bearer ${ACCESS_TOKEN}" \
-H "Content-Type: application/json" \
-X POST "https://datalineage.googleapis.com/v1/projects/my-billing-project/locations/us:searchLineageStreaming?fields=links.processes.process,links.source,links.target,links.depth" \
--data '{
"parent": "projects/my-billing-project/locations/us",
"locations": ["us"],
"rootCriteria": {
"entities": {
"entities": [{
"fullyQualifiedName": "bigquery:my-project.dataset.target_table"
}]
}
},
"direction": "UPSTREAM",
"limits": {
"maxProcessPerLink": 5
}
}'
Search both table-level and column-level lineage
You can search for both table-level (asset-level) and column-level (field-level)
lineage in a single request by providing multiple entities in the
rootCriteria.entities.entities list:
For table-level lineage, omit the
fieldarray.For column-level lineage, specify a single column in the
fieldarray.
For example:
curl -H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-X POST https://datalineage.googleapis.com/v1/projects/my-billing-project/locations/us:searchLineageStreaming \
--data '{
"parent": "projects/my-billing-project/locations/us",
"locations": ["us"],
"rootCriteria": {
"entities": {
"entities": [
{
"fullyQualifiedName": "bigquery:my-project.dataset.table_a"
},
{
"fullyQualifiedName": "bigquery:my-project.dataset.table_b",
"field": ["email"]
}
]
}
},
"direction": "DOWNSTREAM"
}'
Use wildcards for column-level lineage
To search for all available column-level lineage for a specific table without
listing every column individually, use the wildcard character * as the single
value in the field array.
For example:
curl -H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-X POST https://datalineage.googleapis.com/v1/projects/my-billing-project/locations/us:searchLineageStreaming \
--data '{
"parent": "projects/my-billing-project/locations/us",
"locations": ["us"],
"rootCriteria": {
"entities": {
"entities": [{
"fullyQualifiedName": "bigquery:my-project.dataset.my_table",
"field": ["*"]
}]
}
},
"direction": "DOWNSTREAM"
}'
Filter lineage results
You can refine your lineage search results by using the filters block in the
request body.
Filter by dependency type
To restrict results to specific dependency types, such as direct copies
(EXACT_COPY) or transformations like filtering and grouping (OTHER), use
the dependencyTypes filter.
For example:
curl -H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-X POST https://datalineage.googleapis.com/v1/projects/my-billing-project/locations/us:searchLineageStreaming \
--data '{
"parent": "projects/my-billing-project/locations/us",
"locations": ["us"],
"rootCriteria": {
"entities": {
"entities": [{
"fullyQualifiedName": "bigquery:my-project.dataset.my_table"
}]
}
},
"direction": "DOWNSTREAM",
"filters": {
"dependencyTypes": ["EXACT_COPY"]
}
}'
Restrict to table-only lineage
To ensure that the search returns only table-level lineage and completely
excludes column-level lineage, set the entitySet filter to ENTITIES.
For example:
curl -H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-X POST https://datalineage.googleapis.com/v1/projects/my-billing-project/locations/us:searchLineageStreaming \
--data '{
"parent": "projects/my-billing-project/locations/us",
"locations": ["us"],
"rootCriteria": {
"entities": {
"entities": [{
"fullyQualifiedName": "bigquery:my-project.dataset.my_table"
}]
}
},
"direction": "DOWNSTREAM",
"filters": {
"entitySet": "ENTITIES"
}
}'
Filter by time range
You can restrict the lineage search results to a specific time interval.
For example, to search for lineage data created after a specific timestamp, use the following request:
curl -H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-X POST https://datalineage.googleapis.com/v1/projects/my-billing-project/locations/us:searchLineageStreaming \
--data '{
"parent": "projects/my-billing-project/locations/us",
"locations": ["us"],
"rootCriteria": {
"entities": {
"entities": [{
"fullyQualifiedName": "bigquery:my-project.dataset.my_table"
}]
}
},
"direction": "DOWNSTREAM",
"filters": {
"timeRange": {
"startTime": "2026-01-01T00:00:00Z"
}
}
}'
Handle unreachable locations (Partial results)
Because the streaming API scans across a distributed set of projects and locations simultaneously, some remote regions might be temporarily down, uncommunicative, or misconfigured during execution.
To protect data integrity, the searchLineageStreamingResponse stream contains
a dedicated diagnostic field called unreachable:
Field name:
unreachable(represented as a repeated string)Value format:
projects/PROJECT_NUMBER/locations/LOCATION(for example,projects/123456789/locations/us-east1)
What's next
Learn more about multi-region lineage search.
Learn more about data lineage.
Learn more about lineage visualization.