The Data Lineage API can ingest lineage information from systems that integrate with
OpenLineage, an open standard for lineage collection.
When you send OpenLineage-formatted events to the Data Lineage API using the
ProcessOpenLineageRunEvent
method, the Data Lineage API maps attributes from the OpenLineage message to corresponding
attributes in the Data Lineage API.
This document provides reference tables for these mappings.
Attribute mapping
The ProcessOpenLineageRunEvent
REST API method maps OpenLineage attributes to Data Lineage API attributes as
follows:
| Data Lineage API attributes | OpenLineage attributes |
|---|---|
| Process.name | projects/PROJECT_NUMBER/locations/LOCATION/processes/HASH_OF_NAMESPACE_AND_NAME |
| Process.displayName | Job.namespace + ":" + Job.name |
| Process.attributes | Job.facets (see Stored data) |
| Run.name | projects/PROJECT_NUMBER/locations/LOCATION/processes/HASH_OF_NAMESPACE_AND_NAME/runs/HASH_OF_RUNID |
| Run.displayName | Run.runId |
| Run.attributes | Run.facets (see Stored data) |
| Run.startTime | eventTime |
| Run.endTime | eventTime |
| Run.state | eventType |
| LineageEvent.name | projects/PROJECT_NUMBER/locations/LOCATION/processes/HASH_OF_NAMESPACE_AND_NAME/runs/HASH_OF_RUNID/lineageEvents/HASH_OF_JOB_RUN_INPUT_OUTPUTS_OF_EVENT (for example, projects/11111111/locations/us/processes/1234/runs/4321/lineageEvents/111-222-333) |
| LineageEvent.EventLinks.source | inputs (fqn is namespace and name concatenation) |
| LineageEvent.EventLinks.target | outputs (fqn is namespace and name concatenation) |
| LineageEvent.startTime | eventTime |
| LineageEvent.endTime | eventTime |
| requestId | Defined by the method user |
FQN mapping
The following table provides examples of OpenLineage namespace and name pairs for various systems, and their equivalent Dataplex Universal Catalog fully qualified names (FQN):
| System | OpenLineage namespace | OpenLineage name | Dataplex Universal Catalog FQN |
|---|---|---|---|
| Athena | awsathena://athena.{region_name}.amazonaws.com |
|
|
| AWS Glue | arn:aws:glue:{region}:{account id} |
table/{database name}/{table name} |
aws_glue:table:{region}.{account id}.{database name}.{table name} |
| Azure Cosmos DB | azurecosmos://{host}/dbs/{database} |
colls/{table} |
|
| Azure Data Explorer | azurekusto://{host}.kusto.windows.net |
{database}/{table} |
|
| Azure Synapse | sqlserver://{host}:{port} |
|
|
| BigQuery | bigquery |
|
|
| Cassandra | cassandra://{host}:{port} |
|
|
| MySQL | mysql://{host}:{port} |
|
|
| CrateDB | crate://{host}:{port} |
{database}.{schema}.{table} |
Not supported |
| DB2 | db2://{host}:{port} |
|
|
| Hive | hive://{host}:{port} |
{database}.{table} |
Not supported |
| MSSQL | mssql://{host}:{port} |
{database}.{schema}.{table} |
Not supported |
| OceanBase | oceanbase://{host}:{port} |
{database}.{table} |
Not supported |
| Oracle | oracle://{host}:{port} |
{serviceName}.{schema}.{table} or {sid}.{schema}.{table} |
|
| Postgres | postgres://{host}:{port} |
|
|
| Teradata | teradata://{host}:{port} |
{database}.{table} |
Not supported |
| Redshift | redshift://{cluster_identifier}.{region_name}:{port} |
|
|
| Snowflake | snowflake://{organization name}-{account name} or snowflake://{account-locator}(.{compliance})(.{cloud_region_id})(.{cloud}) |
|
|
| Spanner | spanner://{projectId}:{instanceId} |
{database}.{schema}.{table} |
Supported in Dataplex Universal Catalog, but not supported in Data lineage |
| Trino | trino://{host}:{port} |
|
|
| ABFSS (Azure Data Lake Gen2) | abfss://{container name}@{service name}.dfs.core.windows.net |
{path} |
|
| DBFS (Databricks File System) | dbfs://{workspace name} |
{path} |
|
| Cloud Storage | gs://{bucket name} |
{object key} |
|
| HDFS | hdfs://{namenode host}:{namenode port} |
{path} |
|
| Kafka | kafka://{bootstrap server host}:{port} |
{topic} |
kafka:{serverHostWithPort}.{topicId} |
| Local file system | file |
{path} |
filesystem:localhost.{path} |
| Remote file system | file://{host} |
{path} |
filesystem:{hostWithPort}.{path} |
| S3 | s3://{bucket name} |
{object key} |
Namespace prefixes s3a and s3n are also accepted and converted to s3
|
| WASBS (Azure Blob Storage) | wasbs://{container name}@{service name}.dfs.core.windows.net |
{object key} |
|
| Pub/Sub Topic | pubsub |
topic:{projectId}:{topicId} |
pubsub:topic:{projectId}.{topicId} |
| Pub/Sub Subscription | pubsub |
subscription:{projectId}:{subscriptionId} |
pubsub:subscription:{projectId}.{subscriptionId} |
Additional accepted formats
While OpenLineage doesn't define standard namespace/name pairs for the
following systems, the Data Lineage API accepts lineage events for them when
formatted as described in the following table. Resources that are referenced in
OpenLineage messages with the namespace custom are interpreted as custom
fully qualified names.
| System | OpenLineage namespace | OpenLineage name | Dataplex Universal Catalog FQN |
|---|---|---|---|
| Custom FQN | custom |
{some reference} |
custom:{someReference} |
| Dataproc Metastore | dataproc_metastore |
|
|
What's next
- Learn how to integrate with OpenLineage.
- See the reference for fully qualified names.
- Explore the Data Lineage API.
- Learn how to view lineage information.