Lakehouse table format overview

This document describes the different table formats available when building a lakehouse on Google Cloud and helps you choose the right one for your needs.

When building a lakehouse using Google Cloud Lakehouse, you can choose from several table formats that offer different levels of management, performance, and interoperability. Your choice depends on where your data originates, which engines you want to use for writing and transformation, and how much control you need over storage and metadata.

Table formats

When building a Google Cloud Lakehouse, you have the following choices for the format of your tables, categorized by the catalog that manages them:

Lakehouse runtime catalog tables

Recommended

The Lakehouse runtime catalog supports open compatibility and management for Apache Iceberg tables.

  • Lakehouse Iceberg REST catalog tables: These are Apache Iceberg tables that you create from open source engines and store in Cloud Storage. They offer open compatibility and Read/Write interoperability between BigQuery and Iceberg-compatible engines. This option is best if you want your ETL workflow to be managed by open source engines.

BigQuery catalog tables

The BigQuery catalog manages native tables, Apache Iceberg tables, and external tables.

  • Apache Iceberg tables: These are Apache Iceberg tables that you create and manage from BigQuery and store in Cloud Storage. While they can be read by open source engines, BigQuery is the engine that manages the metadata and writes to them. This option is best if you want your workflow to be fully managed by BigQuery.

  • Native tables: These are native BigQuery tables. They are fully managed and offer the most advanced analytics and management features. This option is best for non-Iceberg workloads.

  • External tables: These tables are BigQuery-specific constructs for data stored in Cloud Storage, Amazon S3, or Azure Blob Storage. The data and metadata are self-managed, and BigQuery only has read access. Choose this option for data you want to manage in a third-party catalog or storage directly.

Use the following chart to compare your table format options:

External tables Lakehouse Iceberg REST catalog tables Apache Iceberg tables Standard BigQuery tables
Metastore External or self-hosted metastore Lakehouse runtime catalog BigQuery catalog BigQuery catalog
Storage Cloud Storage / Amazon S3 / Azure Cloud Storage Cloud Storage BigQuery
Storage optimization Customer or third-party managed Customer or third-party managed Google managed Google managed
Read / Write Open source engines (read/write)

BigQuery (read only)
Open source engines (read/write)

BigQuery (read only)
Open source engines (read only with Iceberg libraries, read/write interoperability with BigQuery Storage API)

BigQuery (read/write)

Open source engines (read/write interoperability with BigQuery Storage API)

BigQuery (read/write)

Use cases Staging tables for BigQuery loads, legacy query-only tables Open lakehouse Open lakehouse with high-performant, enterprise-grade storage for advanced analytics, streaming, and AI Enterprise-grade storage for advanced analytics, streaming, and AI