Understand table types and capabilities

Lakehouse for Apache Iceberg supports multiple table types, offering different levels of management, performance, and interoperability for your lakehouse on Google Cloud. Based on your data origin, write engine requirements, and control needs, you can choose table formats supported by either the Lakehouse runtime catalog or BigQuery.

Table formats by catalog and engine

Select a catalog or engine below to learn about its supported table formats, metastore configuration, storage optimization capabilities, and engine interoperability.

Iceberg REST catalog

The Lakehouse runtime catalog manages Apache Iceberg tables through the Iceberg REST catalog endpoint, providing a standard REST interface for wide compatibility with open-source engines like Apache Spark, Apache Flink, and Trino. You create these tables from open-source engines and store them in Cloud Storage. This option is best if you want your ETL workflow to be managed by open-source engines and only require read access from BigQuery.

Key features include:

  • Metastore: Lakehouse runtime catalog.
  • Storage: Cloud Storage.
  • Storage optimization: Managed by you or a third party.
  • Read and write access:
    • Open-source engines: Read and write.
    • BigQuery: Read only.
  • Use cases: Open lakehouse with high-performance, enterprise-grade storage for advanced analytics, streaming, and AI.

Hive metastore

The Lakehouse runtime catalog manages Apache Hive tables through an Apache Hive metastore (HMS) endpoint optimized for Apache Spark ExternalCatalog compatibility, letting you seamlessly share data across Apache Spark, Apache Hive, and BigQuery. You create these tables from open-source engines and store them in Cloud Storage. This option is best if you want your ETL workflow to be managed by open-source engines without needing a separate self-hosted Hive metastore, and only require read access from BigQuery.

Key features include:

  • Metastore: Lakehouse runtime catalog (through custom IMetastoreClient).
  • Storage: Cloud Storage (supporting formats like Parquet, ORC, and Avro).
  • Storage optimization: Managed by you or a third party.
  • Read and write access:
    • Open-source engines (Spark and Hive): Read and write.
    • BigQuery: Read only.
  • Use cases: Migrating existing Spark and Hive workloads to a fully managed, serverless metastore on Google Cloud.

BigQuery

BigQuery supports Apache Iceberg tables, native tables, and external tables.

  • Apache Iceberg tables: These are Apache Iceberg tables that you create and manage from BigQuery and store in Cloud Storage. While they can be read by open-source engines, BigQuery is the engine that manages the metadata and writes to them. This option is best if you want your workflow to be fully managed by BigQuery.

  • Native tables: These are native BigQuery tables. They are fully managed and offer the most advanced analytics and management features. This option is best for non-Iceberg workloads.

  • External tables: These tables are BigQuery-specific constructs for data stored in Cloud Storage, Amazon S3, or Azure Blob Storage. The data and metadata are self-managed, and BigQuery only has read access. Choose this option for data you want to manage in a third-party catalog or storage directly.

Compare table types

Use the following chart to compare table types between the Lakehouse runtime catalog and BigQuery.

Lakehouse

Apache Iceberg (GA) Apache Hive (Preview)
Metastore Lakehouse runtime catalog Lakehouse runtime catalog
Storage Cloud Storage Cloud Storage
Storage optimization Customer or third-party managed Customer or third-party managed
Read / Write Open source engines (read/write)

BigQuery (read only)
Open source engines (read/write)

BigQuery (read only)
Advanced operations None None
Use cases Open lakehouse Migrating existing Spark and Hive workloads to a fully managed, serverless metastore

BigQuery

BigQuery-managed Iceberg External tables Standard tables
Metastore BigQuery External or self-hosted metastore BigQuery
Storage Cloud Storage Cloud Storage / Amazon S3 / Azure BigQuery
Storage optimization Google managed Customer or third-party managed Google managed
Read / Write Open source engines (read only with Iceberg libraries, read/write interoperability with BigQuery Storage API)

BigQuery (read/write)

Open source engines (read/write)

BigQuery (read only)
Open source engines (read/write interoperability with BigQuery Storage API)

BigQuery (read/write)

Advanced operations High-throughput streaming with BigQuery Storage Write API, Change Data Capture (CDC), and multi-statement transactions None High-throughput streaming with BigQuery Storage Write API, Change Data Capture (CDC), and multi-statement transactions
Use cases Open lakehouse with high-performant, enterprise-grade storage for advanced analytics, streaming, and AI Staging tables for BigQuery loads, legacy query-only tables Enterprise-grade storage for advanced analytics, streaming, and AI

What's next