OpenLineage 매핑

Data Lineage API는 계보 수집을 위한 개방형 표준인 OpenLineage와 통합되는 시스템의 계보 정보를 수집할 수 있습니다. ProcessOpenLineageRunEvent 메서드를 사용하여 OpenLineage 형식의 이벤트를 Data Lineage API로 전송하면 Data Lineage API가 OpenLineage 메시지의 속성을 Data Lineage API의 해당 속성에 매핑합니다.

이 문서에서는 이러한 매핑에 대한 참조 표를 제공합니다.

속성 매핑

ProcessOpenLineageRunEvent REST API 메서드는 다음과 같이 OpenLineage 속성을 Data Lineage API 속성에 매핑합니다.

Data Lineage API 속성 OpenLineage 속성
Process.name projects/PROJECT_NUMBER/locations/LOCATION/processes/HASH_OF_NAMESPACE_AND_NAME
Process.displayName Job.namespace + ":" + Job.name
Process.attributes Job.facets(저장된 데이터 참조)
Run.name projects/PROJECT_NUMBER/locations/LOCATION/processes/HASH_OF_NAMESPACE_AND_NAME/runs/HASH_OF_RUNID
Run.displayName Run.runId
Run.attributes Run.facets(저장된 데이터 참조)
Run.startTime eventTime
Run.endTime eventTime
Run.state eventType
LineageEvent.name projects/PROJECT_NUMBER/locations/LOCATION/processes/HASH_OF_NAMESPACE_AND_NAME/runs/HASH_OF_RUNID/lineageEvents/HASH_OF_JOB_RUN_INPUT_OUTPUTS_OF_EVENT(예: projects/11111111/locations/us/processes/1234/runs/4321/lineageEvents/111-222-333)
LineageEvent.EventLinks.source inputs(fqn은 네임스페이스 및 이름 연결임)
LineageEvent.EventLinks.target outputs(fqn은 네임스페이스 및 이름 연결임)
LineageEvent.startTime eventTime
LineageEvent.endTime eventTime
requestId 메서드 사용자가 정의함

FQN 매핑

다음 표에는 다양한 시스템의 OpenLineage 네임스페이스 및 이름 쌍과 이에 상응하는 Dataplex Universal Catalog 정규화된 이름 (FQN)이 나와 있습니다.

시스템 OpenLineage 네임스페이스 OpenLineage 이름 Dataplex Universal Catalog FQN
Athena awsathena://athena.{region_name}.amazonaws.com
  • {catalog}
  • {catalog}.{database}
  • {catalog}.{database}.{table}
  • athena:{catalogId}.{region}
  • athena:{catalogId}.{region}.{databaseId}
  • athena:{catalogId}.{region}.{databaseId}.{tableId}
AWS Glue arn:aws:glue:{region}:{account id} table/{database name}/{table name} aws_glue:table:{region}.{account id}.{database name}.{table name}
Azure Cosmos DB azurecosmos://{host}/dbs/{database} colls/{table}
  • cosmos-db:{host}.{database}
  • cosmos-db:{host}.{database}.{table}
Azure 데이터 탐색기 azurekusto://{host}.kusto.windows.net {database}/{table}
  • kusto:{host}.{region}.{database}
  • kusto:{host}.{region}.{database}.{table}
Azure Synapse sqlserver://{host}:{port}
  • {database}
  • {database}.{schema}
  • {database}.{schema}.{table}
  • sqlserver:{hostWithPort}.{databaseId}
  • sqlserver:{hostWithPort}.{databaseId}.{schemaId}
  • sqlserver:{hostWithPort}.{databaseId}.{schemaId}.{tableId}
BigQuery bigquery
  • {project id}.{dataset name}
  • {project id}.{dataset name}.{table name}
  • bigquery:{projectId}.{datasetId}
  • bigquery:{projectId}.{datasetId}.{assetId}
Cassandra cassandra://{host}:{port}
  • {keyspace}
  • {keyspace}.{table}
  • cassandra:{hostWithPort}.{keyspaceId}
  • cassandra:{hostWithPort}.{keyspaceId}.{tableId}
MySQL mysql://{host}:{port}
  • {database}
  • {database}.{table}
  • mysql:{hostWithPort}.{databaseId}
  • mysql:{hostWithPort}.{databaseId}.{tableId}
CrateDB crate://{host}:{port} {database}.{schema}.{table} 지원되지 않음
DB2 db2://{host}:{port}
  • {database}
  • {database}.{schema}
  • {database}.{schema}.{table}
  • db2:{dns}.{databaseId}
  • db2:{dns}.{databaseId}.{schemaId}
  • db2:{dns}.{databaseId}.{schemaId}.{tableId}
Hive hive://{host}:{port} {database}.{table} 지원되지 않음
MSSQL mssql://{host}:{port} {database}.{schema}.{table} 지원되지 않음
OceanBase oceanbase://{host}:{port} {database}.{table} 지원되지 않음
Oracle oracle://{host}:{port} {serviceName}.{schema}.{table} or {sid}.{schema}.{table}
  • oracle:{hostWithPort}.{databaseId}
  • oracle:{hostWithPort}.{databaseId}.{schemaId}
  • oracle:{hostWithPort}.{databaseId}.{schemaId}.{tableId}
Postgres postgres://{host}:{port}
  • {database}
  • {database}.{schema}
  • {database}.{schema}.{table}
  • postgresql:{hostWithPort}.{databaseId}
  • postgresql:{hostWithPort}.{databaseId}.{schemaId}
  • postgresql:{hostWithPort}.{databaseId}.{schemaId}.{tableId}
Teradata teradata://{host}:{port} {database}.{table} 지원되지 않음
Redshift redshift://{cluster_identifier}.{region_name}:{port}
  • {database}
  • {database}.{schema}
  • {database}.{schema}.{table}
  • redshift:{clusterId}.{region}.{port}.{databaseId}
  • redshift:{clusterId}.{region}.{port}.{databaseId}.{schemaId}
  • redshift:{clusterId}.{region}.{port}.{databaseId}.{schemaId}.{tableId}
Snowflake snowflake://{organization name}-{account name} or snowflake://{account-locator}(.{compliance})(.{cloud_region_id})(.{cloud})
  • {database}
  • {database}.{schema}
  • {database}.{schema}.{table}
  • snowflake:{accountName}.{databaseId}
  • snowflake:{accountName}.{databaseId}.{schemaId}
  • snowflake:{accountName}.{databaseId}.{schemaId}.{tableId}
Spanner spanner://{projectId}:{instanceId} {database}.{schema}.{table} Dataplex Universal Catalog에서 지원되지만 데이터 계보에서는 지원되지 않음
Trino trino://{host}:{port}
  • {catalog}
  • {catalog}.{schema}
  • {catalog}.{schema}.{table}
  • trino:{hostWithPort}.{catalogId}
  • trino:{hostWithPort}.{catalogId}.{schemaId}
  • trino:{hostWithPort}.{catalogId}.{schemaId}.{tableId}
ABFSS (Azure Data Lake Gen2) abfss://{container name}@{service name}.dfs.core.windows.net {path}
  • abs:{serviceName}.{containerName}
  • abs:{serviceName}.{containerName}.{path}
DBFS (Databricks 파일 시스템) dbfs://{workspace name} {path}
  • dbfs:{workspace}
  • dbfs:{workspace}.{path}
Cloud Storage gs://{bucket name} {object key}
  • gcs:{bucketName}
  • gcs:{bucketName}.{virtualPath}
HDFS hdfs://{namenode host}:{namenode port} {path}
  • hdfs:{namenodeHostWithPort}
  • hdfs:{namenodeHostWithPort}.{path}
Kafka kafka://{bootstrap server host}:{port} {topic} kafka:{serverHostWithPort}.{topicId}
로컬 파일 시스템 file {path} filesystem:localhost.{path}
원격 파일 시스템 file://{host} {path} filesystem:{hostWithPort}.{path}
S3 s3://{bucket name} {object key}
  • s3:{bucketName}
  • s3:{bucketName}.{objectKey}

네임스페이스 접두사 s3as3n도 허용되며 s3로 변환됩니다.
WASBS (Azure Blob Storage) wasbs://{container name}@{service name}.dfs.core.windows.net {object key}
  • abs:{serviceName}.{containerName}
  • abs:{serviceName}.{containerName}.{objectKey}
Pub/Sub 주제 pubsub topic:{projectId}:{topicId} pubsub:topic:{projectId}.{topicId}
Pub/Sub 구독 pubsub subscription:{projectId}:{subscriptionId} pubsub:subscription:{projectId}.{subscriptionId}

추가로 허용되는 형식

OpenLineage는 다음 시스템의 표준 namespace/name 쌍을 정의하지 않지만, Data Lineage API는 다음 표에 설명된 대로 형식이 지정된 경우 이러한 시스템의 계보 이벤트를 허용합니다. OpenLineage 메시지에서 custom 네임스페이스로 참조되는 리소스는 커스텀 정규화된 이름으로 해석됩니다.

시스템 OpenLineage 네임스페이스 OpenLineage 이름 Dataplex Universal Catalog FQN
맞춤 FQN custom {some reference} custom:{someReference}
Dataproc Metastore dataproc_metastore
  • dataproc_metastore:{projectId}.{location}.{instanceId}
  • dataproc_metastore:{projectId}.{location}.{instanceId}.{databaseId}
  • dataproc_metastore:{projectId}.{location}.{instanceId}.{databaseId}.{tableId}
  • dataproc_metastore:{projectId}.{location}.{instanceId}
  • dataproc_metastore:{projectId}.{location}.{instanceId}.{databaseId}
  • dataproc_metastore:{projectId}.{location}.{instanceId}.{databaseId}.{tableId}

다음 단계