Search multi-region lineage using server-side automation

This document describes how to look up multi-level, cross-regional data lineage by using the searchLineageStreaming API.

The searchLineageStreaming API performs a breadth-first search in a specified direction (upstream or downstream) starting from a defined set of root entities, and returns a unified lineage graph as a real-time streaming response.

For more information, see About multi-region lineage search.

Key capabilities

The searchLineageStreaming API includes the following capabilities:

  • Breadth-first search: Traverses the lineage graph layer by layer, accurately calculating the depth of each connected asset.

  • Streaming response: Returns subgraphs and lineage links as they are discovered by the backend system. This is highly efficient for broad or deep lineage graphs and prevents request timeouts.

  • Multi-location and multi-project traversal: Although you specify only one billing project in the request path, the API automatically discovers and traverses lineage links across multiple Google Cloud projects and geographical locations, provided you have the required permissions.

  • Fine-grained column-level lineage: Supports searching for column-level dependencies between assets.

  • Wildcard lookups: Lets you to retrieve all column-level lineage for a specific entity by suffixing the fully qualified name (FQN) with *.

  • Pipeline insights: Optionally retrieves metadata about the transformation pipelines (processes) that created the lineage links.

Before you begin

Before you make requests to the API, ensure that you have met the following security and environmental prerequisites:

Required roles

To get the permissions that you need to search for data lineage links, ask your administrator to grant you the Data Lineage Viewer (roles/datalineage.viewer) IAM role on the projects where the lineage links and processes are stored. For more information about granting roles, see Manage access to projects, folders, and organizations.

This predefined role contains the permissions required to search for data lineage links. To see the exact permissions that are required, expand the Required permissions section:

Required permissions

The following permissions are required to search for data lineage links:

  • Search entity-level lineage: datalineage.events.get on the project where the link is stored
  • Search column-level lineage: datalineage.events.getFields on the project where the link is stored
  • Retrieve full pipeline process details: datalineage.processes.get on the project where the process is stored

You might also be able to get these permissions with custom roles or other predefined roles.

Resource scoping

When you configure your API request, you must distinguish between the resource used for administrative billing and the actual locations scanned by the API:

  • Billing parent path: The parent path in the URL request must use the format projects/project/locations/location. This specific project-location pair is used exclusively to evaluate billing quotas and API rate limits.

  • Target locations: Explicitly define the regions you want the backend to scan in the locations array inside the request body.

Authentication setup

Initialize an environment variable with a Google Cloud access token to authenticate your curl commands:

export ACCESS_TOKEN=$(gcloud auth print-access-token)

Usage examples

The following examples use the endpoint datalineage.googleapis.com.

Search multi-level, multi-project lineage

To execute a deep lineage search that traverses across multiple depths of the graph and scans across distinct Google Cloud projects, define the following variables:

  • Set limits.maxDepth to your target traversal depth (accepts values from 1 to 100).

  • Populate the locations array with the target regions you want the backend to cross-reference (for example, ["us", "us-east1"]).

For example:

curl -H "Authorization: Bearer ${ACCESS_TOKEN}" \
-H "Content-Type: application/json" \
-X POST https://datalineage.googleapis.com/v1/projects/my-billing-project/locations/us:searchLineageStreaming \
--data '{
  "parent": "projects/my-billing-project/locations/us",
  "locations": ["us", "us-east1", "us-central1"],
  "rootCriteria": {
    "entities": {
      "entities": [{
        "fullyQualifiedName": "bigquery:project-prod.dataset.source_table"
      }]
    }
  },
  "direction": "DOWNSTREAM",
  "limits": {
    "maxDepth": 10,
    "maxResults": 5000
  }
}'

Search multiple geographical locations

You can limit or expand your lineage graph scan by modifying the geographical regions passed inside the locations repeated array field.

For example:

curl -H "Authorization: Bearer ${ACCESS_TOKEN}" \
-H "Content-Type: application/json" \
-X POST https://datalineage.googleapis.com/v1/projects/my-billing-project/locations/us:searchLineageStreaming \
--data '{
  "parent": "projects/my-billing-project/locations/us",
  "locations": ["us", "europe-west1", "asia-south2"],
  "rootCriteria": {
    "entities": {
      "entities": [{
        "fullyQualifiedName": "bigquery:my-project.dataset.global_table"
      }]
    }
  },
  "direction": "DOWNSTREAM"
}'

By default, the API leaves process information omitted (maxProcessPerLink defaults to 0). To retrieve the resource names of the pipelines that created your data links, configure limits.maxProcessPerLink to a non-zero positive integer.

For example:

curl -H "Authorization: Bearer ${ACCESS_TOKEN}" \
-H "Content-Type: application/json" \
-X POST https://datalineage.googleapis.com/v1/projects/my-billing-project/locations/us:searchLineageStreaming \
--data '{
  "parent": "projects/my-billing-project/locations/us",
  "locations": ["us"],
  "rootCriteria": {
    "entities": {
      "entities": [{
        "fullyQualifiedName": "bigquery:my-project.dataset.target_table"
      }]
    }
  },
  "direction": "UPSTREAM",
  "limits": {
    "maxProcessPerLink": 5
  }
}'

Response behavior: The resulting stream populates the links[].processes field with process messages containing only their absolute system resource name (such as projects/my-project/locations/us/processes/my-process).

Retrieve full process details using a FieldMask

If you need full structural metadata about a pipeline (such as its displayName, system attributes, or execution origin) instead of just its resource name, you must use an API FieldMask:

  1. Provide a non-zero value to limits.maxProcessPerLink.

  2. Append a fields query parameter to the URL path, specifying links.processes.process along with other required fields.

For example:

curl -H "Authorization: Bearer ${ACCESS_TOKEN}" \
-H "Content-Type: application/json" \
-X POST "https://datalineage.googleapis.com/v1/projects/my-billing-project/locations/us:searchLineageStreaming?fields=links.processes.process,links.source,links.target,links.depth" \
--data '{
  "parent": "projects/my-billing-project/locations/us",
  "locations": ["us"],
  "rootCriteria": {
    "entities": {
      "entities": [{
        "fullyQualifiedName": "bigquery:my-project.dataset.target_table"
      }]
    }
  },
  "direction": "UPSTREAM",
  "limits": {
    "maxProcessPerLink": 5
  }
}'

Search both table-level and column-level lineage

You can search for both table-level (asset-level) and column-level (field-level) lineage in a single request by providing multiple entities in the rootCriteria.entities.entities list:

  • For table-level lineage, omit the field array.

  • For column-level lineage, specify a single column in the field array.

For example:

curl -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json" \
     -X POST https://datalineage.googleapis.com/v1/projects/my-billing-project/locations/us:searchLineageStreaming \
     --data '{
       "parent": "projects/my-billing-project/locations/us",
       "locations": ["us"],
       "rootCriteria": {
         "entities": {
           "entities": [
             {
               "fullyQualifiedName": "bigquery:my-project.dataset.table_a"
             },
             {
               "fullyQualifiedName": "bigquery:my-project.dataset.table_b",
               "field": ["email"]
             }
           ]
         }
       },
       "direction": "DOWNSTREAM"
     }'

Use wildcards for column-level lineage

To search for all available column-level lineage for a specific table without listing every column individually, use the wildcard character * as the single value in the field array.

For example:

curl -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json" \
     -X POST https://datalineage.googleapis.com/v1/projects/my-billing-project/locations/us:searchLineageStreaming \
     --data '{
       "parent": "projects/my-billing-project/locations/us",
       "locations": ["us"],
       "rootCriteria": {
         "entities": {
           "entities": [{
             "fullyQualifiedName": "bigquery:my-project.dataset.my_table",
             "field": ["*"]
           }]
         }
       },
       "direction": "DOWNSTREAM"
     }'

Filter lineage results

You can refine your lineage search results by using the filters block in the request body.

Filter by dependency type

To restrict results to specific dependency types, such as direct copies (EXACT_COPY) or transformations like filtering and grouping (OTHER), use the dependencyTypes filter.

For example:

curl -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json" \
     -X POST https://datalineage.googleapis.com/v1/projects/my-billing-project/locations/us:searchLineageStreaming \
     --data '{
       "parent": "projects/my-billing-project/locations/us",
       "locations": ["us"],
       "rootCriteria": {
         "entities": {
           "entities": [{
             "fullyQualifiedName": "bigquery:my-project.dataset.my_table"
           }]
         }
       },
       "direction": "DOWNSTREAM",
       "filters": {
         "dependencyTypes": ["EXACT_COPY"]
       }
     }'

Restrict to table-only lineage

To ensure that the search returns only table-level lineage and completely excludes column-level lineage, set the entitySet filter to ENTITIES.

For example:

curl -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json" \
     -X POST https://datalineage.googleapis.com/v1/projects/my-billing-project/locations/us:searchLineageStreaming \
     --data '{
       "parent": "projects/my-billing-project/locations/us",
       "locations": ["us"],
       "rootCriteria": {
         "entities": {
           "entities": [{
             "fullyQualifiedName": "bigquery:my-project.dataset.my_table"
           }]
         }
       },
       "direction": "DOWNSTREAM",
       "filters": {
         "entitySet": "ENTITIES"
       }
     }'

Filter by time range

You can restrict the lineage search results to a specific time interval.

For example, to search for lineage data created after a specific timestamp, use the following request:

curl -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json" \
     -X POST https://datalineage.googleapis.com/v1/projects/my-billing-project/locations/us:searchLineageStreaming \
     --data '{
       "parent": "projects/my-billing-project/locations/us",
       "locations": ["us"],
       "rootCriteria": {
         "entities": {
           "entities": [{
             "fullyQualifiedName": "bigquery:my-project.dataset.my_table"
           }]
         }
       },
       "direction": "DOWNSTREAM",
       "filters": {
         "timeRange": {
           "startTime": "2026-01-01T00:00:00Z"
         }
       }
     }'

Handle unreachable locations (Partial results)

Because the streaming API scans across a distributed set of projects and locations simultaneously, some remote regions might be temporarily down, uncommunicative, or misconfigured during execution.

To protect data integrity, the searchLineageStreamingResponse stream contains a dedicated diagnostic field called unreachable:

  • Field name: unreachable (represented as a repeated string)

  • Value format: projects/PROJECT_NUMBER/locations/LOCATION (for example, projects/123456789/locations/us-east1)

What's next