About Cross-cloud Lakehouse

Cross-cloud Lakehouse extends Google Cloud Lakehouse, letting you query data in other cloud providers directly from Google Cloud using BigQuery, Dataproc, and Apache Spark. This eliminates the need for data migration and complex ETL, offering unified analytics and AI across all your distributed datasets.

Use cases

Cross-cloud Lakehouse supports several key use cases for accessing data across multiple cloud providers:

  • Reduced data movement lets you query data stored in other cloud environments directly, simplifying data access and processing.
  • Unified analytics lets you perform advanced analytics with consistent features and hardware optimization across all your data, regardless of where it resides.
  • Cross-cloud AI and ML lets you apply AI models, autonomous agents, and machine learning directly to your remote data without migrating it.

How Cross-cloud Lakehouse works

Cross-cloud Lakehouse queries remote data using the following process:

  1. Metadata discovery: Google Cloud Lakehouse connects to remote Apache Iceberg REST catalogs, such as Databricks Unity. Lakehouse discovers the data without copying any files. Through Secret Manager, Lakehouse authenticates securely.
  2. Secure transport: Choosing to route traffic over a private Cross-Cloud Interconnect (CCI) significantly reduces data transfer costs compared to the public internet and makes latency highly predictable.
  3. Optimized execution: As queries read data from remote clouds, Lakehouse temporarily caches those data segments locally within Google Cloud on specialized storage. Subsequent queries use the local cache, which avoids a significant portion of cross-cloud egress charges.

Core concepts

This section describes the key components essential to using Cross-cloud Lakehouse.

Apache Iceberg REST catalog federation

This is the metadata layer. You connect to remote Apache Iceberg REST catalogs, such as Databricks Unity. Lakehouse discovers the data without copying any files. Through Workload Identity Federation (OIDC) or OAuth credentials, Lakehouse authenticates securely without requiring long-lived access keys.

Cross-Cloud Interconnect (CCI)

This is the transport layer. Choosing to route traffic over a private Cross-Cloud Interconnect (CCI) significantly reduces data transfer costs compared to the public internet and makes latency highly predictable.

What's next