BigLake is a storage engine that unites Google Cloud and open source services to create a unified interface for advanced analytics and AI. It provides the foundation to build an open, managed, and high-performance lakehouse with automated data management and built-in governance using Apache Iceberg.
By decoupling storage from compute, BigLake provides interoperability across all Iceberg-compatible engines, such as Apache Spark, Apache Flink, Apache Hive, Trino, or BigQuery, which ensures a consistent view of your data.
Key benefits
- Serverless Architecture: BigLake eliminates the need for server or cluster management, reducing operational overhead and automatically scaling based on demand.
- Unified data management and governance: Integration with Dataplex Universal Catalog ensures central definition and enforcement of governance policies across multiple engines, and enables semantic search, data lineage, and quality checks.
- Storage Extensions: BigLake extends Cloud Storage management capabilities to include features such as Autoclass tiering and Customer-managed encryption keys (CMEK).
- Fully Managed Experience: When integrated with BigQuery, BigLake uses high-throughput streaming and real-time metadata management to provide a fully managed streaming, analytics, and AI experience.
- High availability and disaster recovery: BigLake offers options for cross-region replication and disaster recovery (Preview) to support high availability of your data.
Use cases
- Open lakehouse: Use Cloud Storage as the storage layer, and BigLake provides the management and governance interface for Iceberg data.
- Analytical and transactional integration: Access analytical BigLake Iceberg tables directly within AlloyDB for PostgreSQL (Preview) to combine analytical data with transactional workloads.
- Unified access: Let different engines (Spark, Flink, BigQuery) interact with the same Iceberg tables with consistent metadata.
Catalog interfaces
The BigLake metastore provides two primary catalog interfaces to connect your data across Cloud Storage and BigQuery. For more information, see How BigLake works.
Apache Iceberg REST Catalog: Provides a standard REST interface for wider compatibility with open source engines and tools. This is the recommended interface for new workloads.
Learn how to get started with this Apache Iceberg REST catalog with the quickstart, Use BigLake metastore with Spark and BigQuery using the Iceberg REST catalog
Custom Apache Iceberg Catalog for BigQuery: Enables engines such as Spark to interoperate with BigQuery. This interface is supported for existing workloads.
Interfaces and tools
You can interact with BigLake resources using the following tools:
- The Google Cloud console: Use the console to create catalogs, view catalog properties, view audit logs, and configure permissions.
- BigQuery SQL: Use standard SQL DDL (Data Definition Language) to create and manage BigLake Iceberg tables.
- Open source engines: Use engines such as Apache Spark, Apache Flink, and Apache Hive with BigLake metastore to read and write data.
- BigLake metastore API: Use a REST API that is compatible with the Iceberg REST catalog specification.
What's next
- Understand the architecture: Read How BigLake works.