Seit dem 10. April 2026 heißt Dataplex Universal Catalog jetzt Knowledge Catalog. Die Namen der API, der Clientbibliothek, der CLI und von IAM bleiben unverändert. Weitere Informationen finden Sie unter Google Cloud Knowledge Catalog.

Google uses AI technology to translate content into your preferred language. AI translations can contain errors.

OpenLineage-Zuordnung

Die Data Lineage API kann Herkunftsinformationen aus Systemen aufnehmen, die in OpenLineage eingebunden sind, einem offenen Standard für die Erfassung von Herkunftsinformationen. Wenn Sie OpenLineage-formatierte Ereignisse mit der ProcessOpenLineageRunEvent Methode an die Data Lineage API senden, werden die Attribute aus der OpenLineage-Nachricht den entsprechenden Attributen in der Data Lineage API zugeordnet.

Dieses Dokument enthält Referenztabellen für diese Zuordnungen.

Attributzuordnung

Die ProcessOpenLineageRunEvent REST API-Methode ordnet OpenLineage-Attribute wie folgt Data Lineage API-Attributen zu:

Data Lineage API-Attribute	OpenLineage-Attribute
Process.name	projects/`PROJECT_NUMBER`/locations/`LOCATION`/processes/`HASH_OF_NAMESPACE_AND_NAME`
Process.displayName	Job.namespace + ":" + Job.name
Process.attributes	Job.facets (siehe Gespeicherte Daten)
Run.name	projects/`PROJECT_NUMBER`/locations/`LOCATION`/processes/`HASH_OF_NAMESPACE_AND_NAME`/runs/`HASH_OF_RUNID`
Run.displayName	Run.runId
Run.attributes	Run.facets (siehe Gespeicherte Daten)
Run.startTime	eventTime
Run.endTime	eventTime
Run.state	eventType
LineageEvent.name	projects/`PROJECT_NUMBER`/locations/`LOCATION`/processes/`HASH_OF_NAMESPACE_AND_NAME`/runs/`HASH_OF_RUNID`/lineageEvents/`HASH_OF_JOB_RUN_INPUT_OUTPUTS_OF_EVENT` (z. B. projects/11111111/locations/us/processes/1234/runs/4321/lineageEvents/111-222-333)
LineageEvent.EventLinks.source	inputs (der voll qualifizierte Name ist die Verkettung von Namespace und Name)
LineageEvent.EventLinks.target	outputs (der voll qualifizierte Name ist die Verkettung von Namespace und Name)
LineageEvent.startTime	eventTime
LineageEvent.endTime	eventTime
requestId	Vom Nutzer der Methode definiert

Die von Managed Service for Apache Spark generierte Herkunft auf Spaltenebene wird ebenfalls unterstützt, sofern das Facet columnLineage in den outputs-Objekten verwendet wird. Hier ein Beispiel:

"outputs": [ {
  "namespace": "bigquery",
  "name": "project.dataset.outputtable",
  "columnLineage": {
      "_producer": "https://github.com/OpenLineage/OpenLineage/tree/1.39.0/integration/spark",
      "_schemaURL": "https://openlineage.io/spec/facets/1-2-0/ColumnLineageDatasetFacet.json#/$defs/ColumnLineageDatasetFacet",
      "fields": {
        "output_field": { // This is the name of the output field
          "inputFields": [
            {
              "namespace": "bigquery",
              "name": "project.dataset.inputtable",
              "field": "input_field", // This is the name of the input field
              "transformations": [
                {
                  "type": "DIRECT",
                  "subtype": "IDENTITY",
                  "description": "",
                  "masking": false
                }
              ]
            }
          ]
        }
      },
  }
}]

In diesem Beispiel erstellen Sie einen Link zur Herkunft auf Spaltenebene zwischen input_field und output_field. Sie müssen das Feld „transformations“ einfügen. Andernfalls wird die Herkunft auf Spaltenebene nicht aufgenommen. Weitere Informationen zur OpenLineage-Definition für dieses Facet finden Sie unter Dataset-Facet auf Spaltenebene.

Zuordnung voll qualifizierter Namen

Die folgende Tabelle enthält Beispiele für OpenLineage-Namespace- und -Namen paare für verschiedene Systeme und die entsprechenden voll qualifizierten Namen (Fully Qualified Names, FQN) im Knowledge Catalog (ehemals Dataplex Universal Catalog):

System	OpenLineage-Namespace	OpenLineage-Name	Knowledge Catalog-FQN
Athena	`awsathena://athena.{region_name}.amazonaws.com`	`{catalog}` `{catalog}.{database}` `{catalog}.{database}.{table}`	`athena:{catalogId}.{region}` `athena:{catalogId}.{region}.{databaseId}` `athena:{catalogId}.{region}.{databaseId}.{tableId}`
AWS Glue	`arn:aws:glue:{region}:{account id}`	`table/{database name}/{table name}`	`aws_glue:table:{region}.{account id}.{database name}.{table name}`
Azure Cosmos DB	`azurecosmos://{host}/dbs/{database}`	`colls/{table}`	`cosmos-db:{host}.{database}` `cosmos-db:{host}.{database}.{table}`
Azure Data Explorer	`azurekusto://{host}.kusto.windows.net`	`{database}/{table}`	`kusto:{host}.{region}.{database}` `kusto:{host}.{region}.{database}.{table}`
Azure Synapse	`sqlserver://{host}:{port}`	`{database}` `{database}.{schema}` `{database}.{schema}.{table}`	Nicht unterstützt
BigQuery	`bigquery`	`{project id}.{dataset name}` `{project id}.{dataset name}.{table name}`	`bigquery:{projectId}.{datasetId}` `bigquery:{projectId}.{datasetId}.{assetId}`
Cassandra	`cassandra://{host}:{port}`	`{keyspace}` `{keyspace}.{table}`	`cassandra:{hostWithPort}.{keyspaceId}` `cassandra:{hostWithPort}.{keyspaceId}.{tableId}`
MySQL	`mysql://{host}:{port}`	`{database}` `{database}.{table}`	`mysql:{hostWithPort}.{databaseId}` `mysql:{hostWithPort}.{databaseId}.{tableId}`
CrateDB	`crate://{host}:{port}`	`{database}.{schema}.{table}`	Nicht unterstützt
DB2	`db2://{host}:{port}`	`{database}` `{database}.{schema}` `{database}.{schema}.{table}`	`db2:{dns}.{databaseId}` `db2:{dns}.{databaseId}.{schemaId}` `db2:{dns}.{databaseId}.{schemaId}.{tableId}`
Hive	`hive://{host}:{port}`	`{database}.{table}`	Nicht unterstützt
MSSQL	`mssql://{host}:{port}`	`{database}.{schema}.{table}`	Nicht unterstützt
OceanBase	`oceanbase://{host}:{port}`	`{database}.{table}`	Nicht unterstützt
Oracle	`oracle://{host}:{port}`	`{serviceName}.{schema}.{table} or {sid}.{schema}.{table}`	`oracle:{hostWithPort}.{databaseId}` `oracle:{hostWithPort}.{databaseId}.{schemaId}` `oracle:{hostWithPort}.{databaseId}.{schemaId}.{tableId}`
Postgres	`postgres://{host}:{port}`	`{database}` `{database}.{schema}` `{database}.{schema}.{table}`	`postgresql:{hostWithPort}.{databaseId}` `postgresql:{hostWithPort}.{databaseId}.{schemaId}` `postgresql:{hostWithPort}.{databaseId}.{schemaId}.{tableId}`
Teradata	`teradata://{host}:{port}`	`{database}.{table}`	Nicht unterstützt
Redshift	`redshift://{cluster_identifier}.{region_name}:{port}`	`{database}` `{database}.{schema}` `{database}.{schema}.{table}`	`redshift:{clusterId}.{region}.{port}.{databaseId}` `redshift:{clusterId}.{region}.{port}.{databaseId}.{schemaId}` `redshift:{clusterId}.{region}.{port}.{databaseId}.{schemaId}.{tableId}`
Snowflake	`snowflake://{organization name}-{account name} or snowflake://{account-locator}(.{compliance})(.{cloud_region_id})(.{cloud})`	`{database}` `{database}.{schema}` `{database}.{schema}.{table}`	`snowflake:{accountName}.{databaseId}` `snowflake:{accountName}.{databaseId}.{schemaId}` `snowflake:{accountName}.{databaseId}.{schemaId}.{tableId}`
Spanner	`spanner://{projectId}:{instanceId}`	`{database}.{schema}.{table}`	Im Knowledge Catalog unterstützt, aber nicht in der Datenherkunft
Trino	`trino://{host}:{port}`	`{catalog}` `{catalog}.{schema}` `{catalog}.{schema}.{table}`	`trino:{hostWithPort}.{catalogId}` `trino:{hostWithPort}.{catalogId}.{schemaId}` `trino:{hostWithPort}.{catalogId}.{schemaId}.{tableId}`
ABFSS (Azure Data Lake Gen2)	`abfss://{container name}@{service name}.dfs.core.windows.net`	`{path}`	`abs:{serviceName}.{containerName}` `abs:{serviceName}.{containerName}.{path}`
DBFS (Databricks File System)	`dbfs://{workspace name}`	`{path}`	`dbfs:{workspace}` `dbfs:{workspace}.{path}`
Cloud Storage	`gs://{bucket name}`	`{object key}`	`gcs:{bucketName}` `gcs:{bucketName}.{virtualPath}`
HDFS	`hdfs://{namenode host}:{namenode port}`	`{path}`	`hdfs:{namenodeHostWithPort}` `hdfs:{namenodeHostWithPort}.{path}`
Kafka	`kafka://{bootstrap server host}:{port}`	`{topic}`	`kafka:{serverHostWithPort}.{topicId}`
Lokales Dateisystem	`file`	`{path}`	`filesystem:localhost.{path}`
Remote-Dateisystem	`file://{host}`	`{path}`	`filesystem:{hostWithPort}.{path}`
S3	`s3://{bucket name}`	`{object key}`	`s3:{bucketName}` `s3:{bucketName}.{objectKey}` Namespace-Präfixe `s3a` und `s3n` werden ebenfalls akzeptiert und in `s3` konvertiert.
WASBS (Azure Blob Storage)	`wasbs://{container name}@{service name}.dfs.core.windows.net`	`{object key}`	`abs:{serviceName}.{containerName}` `abs:{serviceName}.{containerName}.{objectKey}`
Pub/Sub-Thema	`pubsub`	`topic:{projectId}:{topicId}`	`pubsub:topic:{projectId}.{topicId}`
Pub/Sub-Abo	`pubsub`	`subscription:{projectId}:{subscriptionId}`	`pubsub:subscription:{projectId}.{subscriptionId}`

Zusätzliche akzeptierte Formate

OpenLineage definiert zwar keine Standardpaare aus namespace/name für die folgenden Systeme, die Data Lineage API akzeptiert jedoch Herkunftsereignisse für diese Systeme, wenn sie wie in der folgenden Tabelle beschrieben formatiert sind. Ressourcen, auf die in OpenLineage-Nachrichten mit dem Namespace custom verwiesen wird, werden als benutzerdefinierte voll qualifizierte Namen interpretiert.

System	OpenLineage-Namespace	OpenLineage-Name	Knowledge Catalog-FQN
Benutzerdefinierter FQN	`custom`	`{some reference}`	`custom:{someReference}`
Dataproc Metastore	`dataproc_metastore`	`dataproc_metastore:{projectId}.{location}.{instanceId}` `dataproc_metastore:{projectId}.{location}.{instanceId}.{databaseId}` `dataproc_metastore:{projectId}.{location}.{instanceId}.{databaseId}.{tableId}`	`dataproc_metastore:{projectId}.{location}.{instanceId}` `dataproc_metastore:{projectId}.{location}.{instanceId}.{databaseId}` `dataproc_metastore:{projectId}.{location}.{instanceId}.{databaseId}.{tableId}`

Nächste Schritte

Informationen zur Einbindung in OpenLineage.
Referenz für voll qualifizierte Namen
Data Lineage API
Informationen zum Aufrufen von Herkunftsinformationen

OpenLineage-Zuordnung Mit Sammlungen den Überblick behalten Sie können Inhalte basierend auf Ihren Einstellungen speichern und kategorisieren.

Attributzuordnung

Zuordnung voll qualifizierter Namen

Zusätzliche akzeptierte Formate

Nächste Schritte

OpenLineage-Zuordnung