This document defines the key terms and concepts for BigLake.
This page is not an exhaustive list of features, but instead a general reference of terms and concepts used throughout the BigLake documentation.
Core Concepts
The following concepts form the foundation of the BigLake architecture.
Data Lakehouse
A data lakehouse is a data architecture that combines the cost-efficiency and flexibility of a data lake with the data management and performance structures of a data warehouse. BigLake enables a lakehouse architecture by letting you keep data in open formats on Cloud Storage while using BigQuery features such as fine-grained security and high-performance querying.
Open Interoperability
Open interoperability is the ability for multiple analytical and transactional systems—such as BigQuery, Spark, and Flink—to operate on a single copy of data in open formats such as Apache Iceberg. This eliminates the need for data duplication and ensures a consistent view of data across disparate tools.
BigLake Metastore
BigLake metastore is a centralized, serverless metadata service that acts as the single source of truth for your lakehouse. It lets multiple engines, such as Spark, Flink, and BigQuery, discover and query the same tables simultaneously.
Catalog Types
The BigLake metastore offers two different types of catalogs for managing your metadata. Your choice of catalog is a fundamental decision that affects how you interact with your data.
Iceberg REST catalog
This is a catalog based on the Apache Iceberg REST catalog specification. It provides interoperability between open source engines and BigQuery, and supports features such as credential vending and disaster recovery.
Custom Iceberg catalog for BigQuery
This is an integration that uses BigQuery directly as the backing metastore.
Table Formats
BigLake supports several table formats, depending on the engine used to manage the data.
BigLake Iceberg tables in BigQuery
These are Iceberg tables that you create from BigQuery and store in Cloud Storage. BigQuery handles all data layout and optimization. While these tables can be read by multiple engines, BigQuery is the only engine that can directly write to them.
BigLake Iceberg tables
These are Iceberg tables created from open source engines and stored in Cloud Storage. The BigLake metastore serves as the central catalog. The open source engine that created the table is the only engine that can write to it.
Standard BigQuery tables
These tables are managed by BigQuery and store data in BigQuery storage. You can connect these tables to BigLake metastore.
External tables
External tables reside outside of BigLake metastore. The data and metadata are self-managed in a third-party catalog. BigQuery can only read from these tables.
Table Features
BigLake provides several features that simplify data management and improve query performance for Iceberg tables.
Table evolution
BigLake supports Iceberg table evolution, which lets you change a table's schema or partition spec over time without rewriting the table data or recreating the table.
Time travel
Time travel lets you query a table's data as it existed at a specific point in time or snapshot ID. This is useful for auditing, reproducing experiments, or restoring data after an accidental deletion.
Metadata caching
Metadata caching is a feature that accelerates query performance for BigLake external tables. It stores a copy of the table's metadata in BigQuery storage, reducing the need to read metadata files from Cloud Storage during query execution.
Automatic table maintenance
Automatic table maintenance simplifies lakehouse management by automating tasks such as compaction and garbage collection for managed tables. This ensures optimal query performance and storage efficiency without manual intervention.
Interoperability Concepts
Interoperability provides data access across Google Cloud and open source systems.
Catalog Federation
Catalog federation is a feature of the Iceberg REST catalog that lets it manage and query tables that are visible to BigQuery, including tables created with the custom Iceberg catalog.
P.C.N.T Naming Structure
The P.C.N.T naming structure is the four-part convention used to uniquely identify and query tables in BigLake metastore from BigQuery. It stands for Project.Catalog.Namespace.Table:
- Project: The Google Cloud project ID
- Catalog: The name of the BigLake metastore catalog
- Namespace: The logical grouping for tables (similar to a dataset)
- Table: The name of the data table
Security Concepts
Security features provide mechanisms for access management and data protection.
Connections
A connection is a BigQuery resource that stores credentials for accessing external data. In BigLake, connections delegate access to Cloud Storage by letting the connection's service account access the storage bucket on your behalf.
Credential Vending
Credential vending is a security mechanism that helps tighten access control when using the Iceberg REST catalog. When enabled, BigLake generates short-lived, scoped-down credentials designed to grant access only to the specific file paths required for a query, rather than passing generic bucket access to Compute Engine. This helps prevent users from bypassing table-level security policies to read raw files directly.
Unified governance
Unified governance lets you define and enforce security and data management policies centrally through integration with Dataplex Universal Catalog.
Reliability Concepts
Reliability features provide data resilience and catalog availability.
Cross-region replication
Cross-region replication replicates metadata across multiple regions to ensure catalog availability during regional outages.
Failover
Failover is the process of switching between primary and secondary regions during a regional outage to maintain catalog operations.