BigLake tables for Apache Iceberg in BigQuery overview

BigLake tables for Apache Iceberg in BigQuery (hereafter BigLake Iceberg tables in BigQuery) provide the foundation for building open-format lakehouses on Google Cloud. BigLake Iceberg tables in BigQuery offer the same fully managed experience as standard BigQuery tables, but store data in customer-owned storage buckets. BigLake Iceberg tables in BigQuery support the open Iceberg table format for better interoperability with open-source and third-party compute engines on a single copy of data.

Feature overview

BigLake Iceberg tables in BigQuery support the following features:

Architecture

BigLake Iceberg tables in BigQuery bring the convenience of BigQuery resource management to tables that reside in your own cloud buckets. You can use BigQuery and open-source compute engines on these tables without moving the data out of the buckets that you control. You must configure a Cloud Storage bucket before you start using BigLake Iceberg tables in BigQuery.

BigLake Iceberg tables in BigQuery utilize BigLake metastore as the unified runtime metastore for all Iceberg data. BigLake metastore provides a single source of truth for managing metadata from multiple engines and allows for engine interoperability.

The following diagram shows the managed table architecture at a high level:

BigLake Iceberg tables in BigQuery architecture diagram.

This table management has the following implications on your bucket:

  • BigQuery creates new data files in the bucket in response to write requests and background storage optimizations, such as DML statements and streaming.
  • When you delete a managed table in BigQuery, BigQuery garbage collects the associated data files in Cloud Storage after the expiration of the time travel period.

Creating a BigLake Iceberg table in BigQuery is similar to creating BigQuery tables. Because it stores data in open formats on Cloud Storage, you must do the following:

  • Specify the Cloud resource connection with WITH CONNECTION to configure the connection credentials for BigLake to access Cloud Storage.
  • Specify the file format of data storage as PARQUET with the file_format = PARQUET statement.
  • Specify the open-source metadata table format as ICEBERG with the table_format = ICEBERG statement.

What's next