Mappatura OpenLineage

L'API Data Lineage può importare informazioni sulla derivazione dai sistemi che si integrano con OpenLineage, uno standard aperto per la raccolta della derivazione. Quando invii eventi in formato OpenLineage all'API Data Lineage utilizzando il metodo ProcessOpenLineageRunEvent, l'API Data Lineage mappa gli attributi del messaggio OpenLineage agli attributi corrispondenti nell'API Data Lineage.

Questo documento fornisce tabelle di riferimento per questi mapping.

Mappatura attributi

Il metodo API REST ProcessOpenLineageRunEvent mappa gli attributi OpenLineage agli attributi dell'API Data Lineage come segue:

Attributi dell'API Data Lineage Attributi OpenLineage
Process.name projects/PROJECT_NUMBER/locations/LOCATION/processes/HASH_OF_NAMESPACE_AND_NAME
Process.displayName Job.namespace + ":" + Job.name
Process.attributes Job.facets (vedi Dati archiviati)
Run.name projects/PROJECT_NUMBER/locations/LOCATION/processes/HASH_OF_NAMESPACE_AND_NAME/runs/HASH_OF_RUNID
Run.displayName Run.runId
Run.attributes Run.facets (vedi Dati archiviati)
Run.startTime eventTime
Run.endTime eventTime
Run.state eventType
LineageEvent.name projects/PROJECT_NUMBER/locations/LOCATION/processes/HASH_OF_NAMESPACE_AND_NAME/runs/HASH_OF_RUNID/lineageEvents/HASH_OF_JOB_RUN_INPUT_OUTPUTS_OF_EVENT (ad esempio, projects/11111111/locations/us/processes/1234/runs/4321/lineageEvents/111-222-333)
LineageEvent.EventLinks.source input (fqn è la concatenazione di spazio dei nomi e nome)
LineageEvent.EventLinks.target output (fqn è la concatenazione di spazio dei nomi e nome)
LineageEvent.startTime eventTime
LineageEvent.endTime eventTime
requestId Definito dall'utente del metodo

Mappatura FQN

La tabella seguente fornisce esempi di coppie di spazio dei nomi e nome OpenLineage per vari sistemi e i relativi nomi completi di Dataplex Universal Catalog:

Sistema Spazio dei nomi OpenLineage Nome OpenLineage Nome di dominio completo di Dataplex Universal Catalog
Athena awsathena://athena.{region_name}.amazonaws.com
  • {catalog}
  • {catalog}.{database}
  • {catalog}.{database}.{table}
  • athena:{catalogId}.{region}
  • athena:{catalogId}.{region}.{databaseId}
  • athena:{catalogId}.{region}.{databaseId}.{tableId}
AWS Glue arn:aws:glue:{region}:{account id} table/{database name}/{table name} aws_glue:table:{region}.{account id}.{database name}.{table name}
Azure Cosmos DB azurecosmos://{host}/dbs/{database} colls/{table}
  • cosmos-db:{host}.{database}
  • cosmos-db:{host}.{database}.{table}
Azure Data Explorer azurekusto://{host}.kusto.windows.net {database}/{table}
  • kusto:{host}.{region}.{database}
  • kusto:{host}.{region}.{database}.{table}
Azure Synapse sqlserver://{host}:{port}
  • {database}
  • {database}.{schema}
  • {database}.{schema}.{table}
  • sqlserver:{hostWithPort}.{databaseId}
  • sqlserver:{hostWithPort}.{databaseId}.{schemaId}
  • sqlserver:{hostWithPort}.{databaseId}.{schemaId}.{tableId}
BigQuery bigquery
  • {project id}.{dataset name}
  • {project id}.{dataset name}.{table name}
  • bigquery:{projectId}.{datasetId}
  • bigquery:{projectId}.{datasetId}.{assetId}
Cassandra cassandra://{host}:{port}
  • {keyspace}
  • {keyspace}.{table}
  • cassandra:{hostWithPort}.{keyspaceId}
  • cassandra:{hostWithPort}.{keyspaceId}.{tableId}
MySQL mysql://{host}:{port}
  • {database}
  • {database}.{table}
  • mysql:{hostWithPort}.{databaseId}
  • mysql:{hostWithPort}.{databaseId}.{tableId}
CrateDB crate://{host}:{port} {database}.{schema}.{table} Non supportata
DB2 db2://{host}:{port}
  • {database}
  • {database}.{schema}
  • {database}.{schema}.{table}
  • db2:{dns}.{databaseId}
  • db2:{dns}.{databaseId}.{schemaId}
  • db2:{dns}.{databaseId}.{schemaId}.{tableId}
Hive hive://{host}:{port} {database}.{table} Non supportata
MSSQL mssql://{host}:{port} {database}.{schema}.{table} Non supportata
OceanBase oceanbase://{host}:{port} {database}.{table} Non supportata
Oracle oracle://{host}:{port} {serviceName}.{schema}.{table} or {sid}.{schema}.{table}
  • oracle:{hostWithPort}.{databaseId}
  • oracle:{hostWithPort}.{databaseId}.{schemaId}
  • oracle:{hostWithPort}.{databaseId}.{schemaId}.{tableId}
Postgres postgres://{host}:{port}
  • {database}
  • {database}.{schema}
  • {database}.{schema}.{table}
  • postgresql:{hostWithPort}.{databaseId}
  • postgresql:{hostWithPort}.{databaseId}.{schemaId}
  • postgresql:{hostWithPort}.{databaseId}.{schemaId}.{tableId}
Teradata teradata://{host}:{port} {database}.{table} Non supportata
Redshift redshift://{cluster_identifier}.{region_name}:{port}
  • {database}
  • {database}.{schema}
  • {database}.{schema}.{table}
  • redshift:{clusterId}.{region}.{port}.{databaseId}
  • redshift:{clusterId}.{region}.{port}.{databaseId}.{schemaId}
  • redshift:{clusterId}.{region}.{port}.{databaseId}.{schemaId}.{tableId}
Snowflake snowflake://{organization name}-{account name} or snowflake://{account-locator}(.{compliance})(.{cloud_region_id})(.{cloud})
  • {database}
  • {database}.{schema}
  • {database}.{schema}.{table}
  • snowflake:{accountName}.{databaseId}
  • snowflake:{accountName}.{databaseId}.{schemaId}
  • snowflake:{accountName}.{databaseId}.{schemaId}.{tableId}
Spanner spanner://{projectId}:{instanceId} {database}.{schema}.{table} Supportato in Dataplex Universal Catalog, ma non in Data lineage
Trino trino://{host}:{port}
  • {catalog}
  • {catalog}.{schema}
  • {catalog}.{schema}.{table}
  • trino:{hostWithPort}.{catalogId}
  • trino:{hostWithPort}.{catalogId}.{schemaId}
  • trino:{hostWithPort}.{catalogId}.{schemaId}.{tableId}
ABFSS (Azure Data Lake Gen2) abfss://{container name}@{service name}.dfs.core.windows.net {path}
  • abs:{serviceName}.{containerName}
  • abs:{serviceName}.{containerName}.{path}
DBFS (Databricks File System) dbfs://{workspace name} {path}
  • dbfs:{workspace}
  • dbfs:{workspace}.{path}
Cloud Storage gs://{bucket name} {object key}
  • gcs:{bucketName}
  • gcs:{bucketName}.{virtualPath}
HDFS hdfs://{namenode host}:{namenode port} {path}
  • hdfs:{namenodeHostWithPort}
  • hdfs:{namenodeHostWithPort}.{path}
Kafka kafka://{bootstrap server host}:{port} {topic} kafka:{serverHostWithPort}.{topicId}
File system locale file {path} filesystem:localhost.{path}
Remote file system file://{host} {path} filesystem:{hostWithPort}.{path}
S3 s3://{bucket name} {object key}
  • s3:{bucketName}
  • s3:{bucketName}.{objectKey}
Sono accettati e convertiti in s3 anche i prefissi dello spazio dei nomi s3a e s3n.
WASBS (Azure Blob Storage) wasbs://{container name}@{service name}.dfs.core.windows.net {object key}
  • abs:{serviceName}.{containerName}
  • abs:{serviceName}.{containerName}.{objectKey}
Argomento Pub/Sub pubsub topic:{projectId}:{topicId} pubsub:topic:{projectId}.{topicId}
Abbonamento Pub/Sub pubsub subscription:{projectId}:{subscriptionId} pubsub:subscription:{projectId}.{subscriptionId}

Altri formati accettati

Sebbene OpenLineage non definisca coppie namespace/name standard per i seguenti sistemi, l'API Data Lineage accetta gli eventi di derivazione per questi sistemi se formattati come descritto nella tabella seguente. Le risorse a cui viene fatto riferimento nei messaggi OpenLineage con lo spazio dei nomi custom vengono interpretate come nomi completi personalizzati.

Sistema Spazio dei nomi OpenLineage Nome OpenLineage Nome di dominio completo del Catalogo universale Dataplex
FQN personalizzato custom {some reference} custom:{someReference}
Dataproc Metastore dataproc_metastore
  • dataproc_metastore:{projectId}.{location}.{instanceId}
  • dataproc_metastore:{projectId}.{location}.{instanceId}.{databaseId}
  • dataproc_metastore:{projectId}.{location}.{instanceId}.{databaseId}.{tableId}
  • dataproc_metastore:{projectId}.{location}.{instanceId}
  • dataproc_metastore:{projectId}.{location}.{instanceId}.{databaseId}
  • dataproc_metastore:{projectId}.{location}.{instanceId}.{databaseId}.{tableId}

Passaggi successivi