Lakehouse for Apache Iceberg manages metadata through the Lakehouse runtime catalog. When you use the Apache Iceberg REST catalog endpoint, the system organizes data into a strict resource hierarchy. The catalog configuration determines the supported storage types, regional routing behaviors, and query federation options.
Resource hierarchy
The Apache Iceberg REST catalog endpoint uses a hierarchy of resources to organize your data. The following table provides a high-level look at these resources:
| Resource | Description |
|---|---|
| Catalog | The top-level container, a catalog lets you organize namespaces and tables into logical groups by splitting them up into different catalogs. Each catalog is backed by a designated warehouse storage location (such as a Cloud Storage bucket or BigQuery federation proxy) that stores its underlying metadata and data files. |
| Namespace | A logical grouping used to organize tables within a catalog, this functions like databases, schemas, or directories. |
| Table | Tables contain definitions of rows and columns that can be queried. |
Warehouse storage locations
When you configure your client to connect to the Apache Iceberg REST catalog endpoint, you specify a warehouse storage location. In the underlying API, this corresponds to the CATALOG_TYPE_GCS_BUCKET configuration. Choosing a Cloud Storage bucket as your warehouse location determines how your catalog operates and integrates with other Google Cloud services.
Cloud Storage
Recommended
This is the standard approach where all metadata and data files for the catalog
are stored within a single designated Cloud Storage bucket (gs://).
If your workloads require sharing data across multiple separate buckets, you must
create multiple distinct catalogs.
Using a Cloud Storage bucket gives you direct control over your data layout and supports storage access delegation (credential vending), vending short-lived access tokens directly to client engines. This lets engines read and write data files securely without requiring broad, direct IAM permissions on the underlying bucket, letting you create and manage Apache Iceberg tables directly through the Lakehouse runtime catalog.
For example, if you created your bucket to store your catalog and named it
iceberg-bucket, both your catalog name and bucket name are
iceberg-bucket. This is used later when you query your catalog in
BigQuery, using the P.C.N.T syntax. For example
my-project.lakehouse-catalog-id.quickstart_namespace.quickstart_table.
BigQuery
Alternative approach
This is an alternative approach, known as BigQuery catalog federation,
where the Apache Iceberg REST catalog endpoint acts as a proxy gateway instead
of creating a dedicated Lakehouse catalog container. In the underlying
API, this corresponds to the CATALOG_TYPE_BIGQUERY configuration. When external
engines connect using the bq:// warehouse prefix, the gateway routes their catalog
requests directly to BigQuery's internal catalog.
Using BigQuery catalog federation lets you create and manage tables directly within BigQuery using standard BigQuery DDL or APIs, while giving external OSS engines read-only access to query those tables through the REST catalog endpoint. Because access is managed directly by BigQuery, external engines rely entirely on BigQuery's internal IAM permissions and access control lists (ACLs). Note that storage access delegation (credential vending) is not supported for federated BigQuery catalogs.
For example, if your tables are in project my-project, your warehouse path is
bq://projects/my-project. Unlike Cloud Storage bucket catalogs, federated tables
are visible directly in BigQuery without needing a four-part P.C.N.T name.
E.g. SELECT * FROM my_namespace.my_table.
BigQuery catalog federation is distinct from cross-cloud Lakehouse, which connects Google Cloud to remote external catalogs, such as Databricks Unity Catalog, to query data across different cloud providers.
For more information about configuring and using this workflow, see Use catalog federation with BigQuery.
Bucket and catalog regions
For Cloud Storage bucket warehouses in Lakehouse runtime catalog, the system selects the catalog region to match the underlying bucket's region:
Single-region buckets: The catalog region matches the bucket region exactly.
Dual-region buckets: Includes predefined and user-defined dual regions, such as
ASIA1andNAM4. The catalog region matches the dual regions.Multi-region buckets: The system selects regional locations for the catalog within the multi-region's geographic domain. By default, these locations might not match common BigQuery locations like
USandEU. Instead, they are regional locations within the geographic domain (for example,us-central1andus-east4for aUSmulti-region bucket).
When BigQuery runs a query over tables in these catalogs,
BigQuery routes the query to the region in the catalog's primary
region. If you run a query in a specific virtual region (like US or EU) and
the catalog metadata isn't present in that location, the query might fail.
Specify primary regions for US and EU multi-regions
For catalogs that use a US or EU multi-region bucket, you can specify the
primary region when you create the catalog to ensure that
BigQuery can access it from the corresponding regions.
- Cloud Storage EU multi-region: Specify
EUoreurope-west4. - Cloud Storage US multi-region: Specify
USorus-central1.
The system selects a catalog's primary replica when you create it, but you can
dynamically update it by calling FailoverCatalog. For more information about
defining primary locations, see Create a catalog.
Querying catalogs from BigQuery
When querying Lakehouse runtime catalog tables from BigQuery, you use a four-part naming structure, often referred to as P.C.N.T:
- Project: The Google Cloud project ID that owns the catalog.
- Catalog: The name of the Lakehouse runtime catalog.
- Namespace: The Apache Iceberg namespace (equivalent to a BigQuery dataset).
- Table: The name of the table.
For example, my-project.lakehouse-catalog-id.my-namespace.my-table.