BigLake tables for Apache Iceberg in BigQuery (hereafter BigLake Iceberg tables in BigQuery) provide the foundation for building open-format lakehouses on Google Cloud. BigLake Iceberg tables in BigQuery offer the same fully managed experience as standard BigQuery tables, but store data in customer-owned storage buckets. BigLake Iceberg tables in BigQuery support the open Iceberg table format for better interoperability with open-source and third-party compute engines on a single copy of data.
Feature overview
BigLake Iceberg tables in BigQuery support the following features:
- Table mutations using GoogleSQL data manipulation language (DML).
- Unified batch and high throughput streaming using the BigQuery Storage Write API through BigLake connectors like Spark, Dataflow, and other engines.
- Iceberg V2 snapshot export and automatic refresh on each table mutation for direct query access with open-source and third-party query engines.
- Schema evolution, which lets you add, drop, and rename columns to suit your needs. This feature also lets you change an existing column's data type and column mode. For more information, see type conversion rules.
- Automatic storage optimization, including adaptive file sizing, automatic clustering, garbage collection, and metadata optimization.
- Time travel for historical data access in BigQuery.
- Column-level security and data masking.
- Multi-statement transactions (in Preview).
- Table partitioning (in Preview).
- Table creation in Dataform workflows.
Architecture
BigLake Iceberg tables in BigQuery bring the convenience of BigQuery resource management to tables that reside in your own cloud buckets. You can use BigQuery and open-source compute engines on these tables without moving the data out of the buckets that you control. You must configure a Cloud Storage bucket before you start using BigLake Iceberg tables in BigQuery.
BigLake Iceberg tables in BigQuery utilize BigLake metastore as the unified runtime metastore for all Iceberg data. BigLake metastore provides a single source of truth for managing metadata from multiple engines and allows for engine interoperability.
The following diagram shows the managed table architecture at a high level:
This table management has the following implications on your bucket:
- BigQuery creates new data files in the bucket in response to write requests and background storage optimizations, such as DML statements and streaming.
- When you delete a managed table in BigQuery, BigQuery garbage collects the associated data files in Cloud Storage after the expiration of the time travel period.
Creating a BigLake Iceberg table in BigQuery is similar to creating BigQuery tables. Because it stores data in open formats on Cloud Storage, you must do the following:
- Specify the
Cloud resource connection
with
WITH CONNECTIONto configure the connection credentials for BigLake to access Cloud Storage. - Specify the file format of data storage as
PARQUETwith thefile_format = PARQUETstatement. - Specify the open-source metadata table format as
ICEBERGwith thetable_format = ICEBERGstatement.