This page describes how to configure Datastream for replication to BigLake Iceberg tables in BigQuery.
BigLake Iceberg tables offer the same fully managed experience as standard BigQuery tables, but store data in customer-owned Cloud Storage buckets in the Apache Iceberg table format and Parquet file format. You can query and analyse data using BigQuery capabilities while keeping the data in your own storage buckets.
Table metadata
Datastream appends a STRUCT column named datastream_metadata to each table that's written to the BigQuery destination.
The datastream_metadata column contains the following fields:
UUID: This field has theSTRINGdata type.SOURCE_TIMESTAMP: This field has theINTEGERdata type.CHANGE_SEQUENCE_NUMBER: This field has theSTRINGdata type. It's an internal sequence number used by Datastream for each change event.CHANGE_TYPE: This field has theSTRINGdata type. It indicates the type of the change event. For the append-only write mode the value isINSERT.SORT_KEYS: This field contains an array ofSTRINGvalues. You can use the values to sort the change events.
Configure streaming to BigLake Iceberg tables
To set up your stream to ingest data into BigLake Iceberg tables:
- Create a Cloud Storage bucket where you want to store your data.
- Create a Cloud resource connection in BigQuery. For information about how to create this type of connection, see Create and set up a Cloud resource connection.
Get the identifier of the connection service account:
bq show --location=LOCATION --connection --project_id=PROJECT_ID CONNECTION_NAMEGrant your Cloud resource connection access to the Cloud Storage bucket that you created. To do this, add the
storage.adminIAM permission to the connection service account:gcloud storage buckets add-iam-policy-binding gs://YOUR_GCS_BUCKET \ --member=serviceAccount:YOUR_SERVICE_ACCOUNT_ID \ --role=roles/storage.adminCreate a BigLake Iceberg tables stream.
For information about how to create a BigLake Iceberg tables stream using the Google Cloud console, see Create a stream.
For information about how to create a request to stream data to BigLake Iceberg tables using REST,
Google Cloud CLIor Terraform, see Manage streams using the API.
What's next
- To learn more about streams, see Stream lifecycle.
- To learn how to create a stream, see Create a stream.
- To learn how to create a connection profile that you can use with a BigLake Iceberg tables stream, see Create a connection profile for BigQuery.