Pemetaan OpenLineage

Data Lineage API dapat menyerap informasi silsilah dari sistem yang terintegrasi dengan OpenLineage, standar terbuka untuk pengumpulan silsilah. Saat Anda mengirim peristiwa berformat OpenLineage ke Data Lineage API menggunakan metode ProcessOpenLineageRunEvent, Data Lineage API akan memetakan atribut dari pesan OpenLineage ke atribut yang sesuai di Data Lineage API.

Dokumen ini menyediakan tabel referensi untuk pemetaan ini.

Pemetaan atribut

Metode REST API ProcessOpenLineageRunEvent memetakan atribut OpenLineage ke atribut Data Lineage API sebagai berikut:

Atribut Data Lineage API Atribut OpenLineage
Process.name projects/PROJECT_NUMBER/locations/LOCATION/processes/HASH_OF_NAMESPACE_AND_NAME
Process.displayName Job.namespace + ":" + Job.name
Process.attributes Job.facets (lihat Data tersimpan)
Run.name projects/PROJECT_NUMBER/locations/LOCATION/processes/HASH_OF_NAMESPACE_AND_NAME/runs/HASH_OF_RUNID
Run.displayName Run.runId
Run.attributes Run.facets (lihat Data tersimpan)
Run.startTime eventTime
Run.endTime eventTime
Run.state eventType
LineageEvent.name projects/PROJECT_NUMBER/locations/LOCATION/processes/HASH_OF_NAMESPACE_AND_NAME/runs/HASH_OF_RUNID/lineageEvents/HASH_OF_JOB_RUN_INPUT_OUTPUTS_OF_EVENT (misalnya, projects/11111111/locations/us/processes/1234/runs/4321/lineageEvents/111-222-333)
LineageEvent.EventLinks.source input (fqn adalah gabungan namespace dan nama)
LineageEvent.EventLinks.target output (fqn adalah gabungan namespace dan nama)
LineageEvent.startTime eventTime
LineageEvent.endTime eventTime
requestId Ditentukan oleh pengguna metode

Pemetaan FQN

Tabel berikut memberikan contoh pasangan nama dan namespace OpenLineage untuk berbagai sistem, dan nama lengkap (FQN) yang setara di Dataplex Universal Catalog:

Sistem Namespace OpenLineage Nama OpenLineage FQN Katalog Universal Dataplex
Athena awsathena://athena.{region_name}.amazonaws.com
  • {catalog}
  • {catalog}.{database}
  • {catalog}.{database}.{table}
  • athena:{catalogId}.{region}
  • athena:{catalogId}.{region}.{databaseId}
  • athena:{catalogId}.{region}.{databaseId}.{tableId}
AWS Glue arn:aws:glue:{region}:{account id} table/{database name}/{table name} aws_glue:table:{region}.{account id}.{database name}.{table name}
Azure Cosmos DB azurecosmos://{host}/dbs/{database} colls/{table}
  • cosmos-db:{host}.{database}
  • cosmos-db:{host}.{database}.{table}
Azure Data Explorer azurekusto://{host}.kusto.windows.net {database}/{table}
  • kusto:{host}.{region}.{database}
  • kusto:{host}.{region}.{database}.{table}
Azure Synapse sqlserver://{host}:{port}
  • {database}
  • {database}.{schema}
  • {database}.{schema}.{table}
  • sqlserver:{hostWithPort}.{databaseId}
  • sqlserver:{hostWithPort}.{databaseId}.{schemaId}
  • sqlserver:{hostWithPort}.{databaseId}.{schemaId}.{tableId}
BigQuery bigquery
  • {project id}.{dataset name}
  • {project id}.{dataset name}.{table name}
  • bigquery:{projectId}.{datasetId}
  • bigquery:{projectId}.{datasetId}.{assetId}
Cassandra cassandra://{host}:{port}
  • {keyspace}
  • {keyspace}.{table}
  • cassandra:{hostWithPort}.{keyspaceId}
  • cassandra:{hostWithPort}.{keyspaceId}.{tableId}
MySQL mysql://{host}:{port}
  • {database}
  • {database}.{table}
  • mysql:{hostWithPort}.{databaseId}
  • mysql:{hostWithPort}.{databaseId}.{tableId}
CrateDB crate://{host}:{port} {database}.{schema}.{table} Tidak didukung
DB2 db2://{host}:{port}
  • {database}
  • {database}.{schema}
  • {database}.{schema}.{table}
  • db2:{dns}.{databaseId}
  • db2:{dns}.{databaseId}.{schemaId}
  • db2:{dns}.{databaseId}.{schemaId}.{tableId}
Hive hive://{host}:{port} {database}.{table} Tidak didukung
MSSQL mssql://{host}:{port} {database}.{schema}.{table} Tidak didukung
OceanBase oceanbase://{host}:{port} {database}.{table} Tidak didukung
Oracle oracle://{host}:{port} {serviceName}.{schema}.{table} or {sid}.{schema}.{table}
  • oracle:{hostWithPort}.{databaseId}
  • oracle:{hostWithPort}.{databaseId}.{schemaId}
  • oracle:{hostWithPort}.{databaseId}.{schemaId}.{tableId}
Postgres postgres://{host}:{port}
  • {database}
  • {database}.{schema}
  • {database}.{schema}.{table}
  • postgresql:{hostWithPort}.{databaseId}
  • postgresql:{hostWithPort}.{databaseId}.{schemaId}
  • postgresql:{hostWithPort}.{databaseId}.{schemaId}.{tableId}
Teradata teradata://{host}:{port} {database}.{table} Tidak didukung
Redshift redshift://{cluster_identifier}.{region_name}:{port}
  • {database}
  • {database}.{schema}
  • {database}.{schema}.{table}
  • redshift:{clusterId}.{region}.{port}.{databaseId}
  • redshift:{clusterId}.{region}.{port}.{databaseId}.{schemaId}
  • redshift:{clusterId}.{region}.{port}.{databaseId}.{schemaId}.{tableId}
Snowflake snowflake://{organization name}-{account name} or snowflake://{account-locator}(.{compliance})(.{cloud_region_id})(.{cloud})
  • {database}
  • {database}.{schema}
  • {database}.{schema}.{table}
  • snowflake:{accountName}.{databaseId}
  • snowflake:{accountName}.{databaseId}.{schemaId}
  • snowflake:{accountName}.{databaseId}.{schemaId}.{tableId}
Spanner spanner://{projectId}:{instanceId} {database}.{schema}.{table} Didukung di Dataplex Universal Catalog, tetapi tidak didukung di Silsilah data
Trino trino://{host}:{port}
  • {catalog}
  • {catalog}.{schema}
  • {catalog}.{schema}.{table}
  • trino:{hostWithPort}.{catalogId}
  • trino:{hostWithPort}.{catalogId}.{schemaId}
  • trino:{hostWithPort}.{catalogId}.{schemaId}.{tableId}
ABFSS (Azure Data Lake Gen2) abfss://{container name}@{service name}.dfs.core.windows.net {path}
  • abs:{serviceName}.{containerName}
  • abs:{serviceName}.{containerName}.{path}
DBFS (Databricks File System) dbfs://{workspace name} {path}
  • dbfs:{workspace}
  • dbfs:{workspace}.{path}
Cloud Storage gs://{bucket name} {object key}
  • gcs:{bucketName}
  • gcs:{bucketName}.{virtualPath}
HDFS hdfs://{namenode host}:{namenode port} {path}
  • hdfs:{namenodeHostWithPort}
  • hdfs:{namenodeHostWithPort}.{path}
Kafka kafka://{bootstrap server host}:{port} {topic} kafka:{serverHostWithPort}.{topicId}
Sistem file lokal file {path} filesystem:localhost.{path}
Sistem file jarak jauh file://{host} {path} filesystem:{hostWithPort}.{path}
S3 s3://{bucket name} {object key}
  • s3:{bucketName}
  • s3:{bucketName}.{objectKey}
Awalan namespace
s3a dan s3n juga diterima dan dikonversi menjadi s3
WASBS (Azure Blob Storage) wasbs://{container name}@{service name}.dfs.core.windows.net {object key}
  • abs:{serviceName}.{containerName}
  • abs:{serviceName}.{containerName}.{objectKey}
Pub/Sub Topic pubsub topic:{projectId}:{topicId} pubsub:topic:{projectId}.{topicId}
Langganan Pub/Sub pubsub subscription:{projectId}:{subscriptionId} pubsub:subscription:{projectId}.{subscriptionId}

Format tambahan yang diterima

Meskipun OpenLineage tidak menentukan pasangan namespace/name standar untuk sistem berikut, Data Lineage API menerima peristiwa silsilah untuk sistem tersebut jika diformat seperti yang dijelaskan dalam tabel berikut. Resource yang dirujuk dalam pesan OpenLineage dengan namespace custom ditafsirkan sebagai nama yang sepenuhnya memenuhi syarat kustom.

Sistem Namespace OpenLineage Nama OpenLineage FQN Katalog Universal Dataplex
FQN Kustom custom {some reference} custom:{someReference}
Dataproc Metastore dataproc_metastore
  • dataproc_metastore:{projectId}.{location}.{instanceId}
  • dataproc_metastore:{projectId}.{location}.{instanceId}.{databaseId}
  • dataproc_metastore:{projectId}.{location}.{instanceId}.{databaseId}.{tableId}
  • dataproc_metastore:{projectId}.{location}.{instanceId}
  • dataproc_metastore:{projectId}.{location}.{instanceId}.{databaseId}
  • dataproc_metastore:{projectId}.{location}.{instanceId}.{databaseId}.{tableId}

Langkah berikutnya