Lakehouse for Apache Iceberg manages metadata through the Lakehouse runtime catalog. When you use the Apache Iceberg REST catalog endpoint, the system organizes data into a strict resource hierarchy. The catalog configuration determines the supported storage types, regional routing behaviors, and query federation options.
Capabilities and compliance
The Lakehouse runtime catalog is built to integrate with open-source processing engines by supporting standard table formats and complying with open APIs.
Supported table formats
Apache Iceberg V2 tables (GA) and V3 tables (Preview) are supported. Iceberg V1 tables aren't supported. Before you use existing V1 tables with the Apache Iceberg REST catalog endpoint, you must upgrade them to a supported version. For more information, see Upgrade Iceberg V1 tables to V2.
API compliance and REST operations
The Lakehouse runtime catalog implements the open-source Apache Iceberg REST Catalog API. Client query engines interact with the metastore using standard REST catalog APIs. For more information, see Iceberg REST Catalog API compliance.
Resource hierarchy
The Apache Iceberg REST catalog endpoint uses a hierarchy of resources to organize your data. The following table provides a high-level look at these resources:
| Resource | Description |
|---|---|
| Catalog | The top-level container, a catalog lets you organize namespaces and tables into logical groups by splitting them up into different catalogs. Each catalog is backed by a designated warehouse storage location (such as a Cloud Storage bucket or BigQuery federation proxy) that stores its underlying metadata and data files. |
| Namespace | A logical grouping used to organize tables within a catalog, this functions like databases, schemas, or directories. |
| Table | Tables contain definitions of rows and columns that can be queried. |
Catalogs and storage locations
When you configure your client, you specify a catalog. The catalog you choose determines it's storage location. This choice determines how your catalog operates and integrates with other Google Cloud services. You can choose either a BigLake catalog or a Cloud Storage catalog.
Both options gives you data flexibility and support credential vending for fine-grained access control.
BigLake catalog (bl://)
This approach lets you name your catalog independently of any bucket name, and
lets you configure multiple buckets for a single catalog. In the underlying API,
this corresponds to the CATALOG_TYPE_BIGLAKE configuration.
Considerations:
- Default location: You provide a path to a bucket (
default_location), which acts as the default storage location. For namespaces created without a specified location, thedefault_locationis used. - Restricted locations: You can also provide an optional
restricted_locationsconfiguration for additional buckets or paths where namespaces and tables can be created. - Geographic region group requirements: Although buckets can be cross-project, cross-region, and have different configurations (such as single-region, dual-region, or multi-region), all Cloud Storage locations across the default location and restricted locations must be in the same geographic region group (such as the US, Europe, Canada, or Asia). For example, you can't configure a US multi-region bucket with a bucket in Europe or Canada.
Cloud Storage catalog (gs://)
This is the standard legacy approach where the catalog directly manages Apache
Iceberg metadata and data files in a single Cloud Storage bucket that you
specify. In the underlying API, this corresponds to the
CATALOG_TYPE_GCS_BUCKET configuration.
For Cloud Storage bucket catalogs, the catalog name also inherits the name of your bucket.
For example, if you created your bucket to store your catalog and named it
iceberg-bucket, both your catalog name and bucket name are
iceberg-bucket. This is used later when you query your catalog in
BigQuery, using the P.C.N.T syntax. For example
my-project.lakehouse-catalog-id.quickstart_namespace.quickstart_table.
Considerations:
- Upgrade to BigLake catalogs: you can upgrade an existing Cloud
Storage bucket catalog (
gs://catalog) to a BigLake catalog (bl://catalog). The upgraded catalog retains the original bucket's name. After, you can associate multiple buckets with the catalog and configure restricted locations.
Bucket and catalog regions
The region of a catalog endpoint in the Lakehouse runtime catalog is determined by the region of its underlying Cloud Storage bucket:
- BigLake catalogs: The catalog region is derived from the bucket
configured in
default_location. - Cloud Storage bucket catalogs: The catalog region is derived from the bucket associated with the catalog.
The mapped catalog region varies depending on the bucket's region type:
- Single-region: The catalog region matches the bucket's region exactly.
- Dual-region: The catalog region matches the bucket's dual-region (such
as
ASIA1orNAM4). - Multi-region: The catalog region is set to a specific regional location
within the multi-region's geographic domain. By default, this might not
align with common BigQuery multi-regions like
USandEU(for example, aUSmulti-region bucket maps tous-central1orus-east4).
When BigQuery runs a query over tables in these catalogs, it
routes the query to the catalog's primary region. If you query tables in a
specific virtual region (such as US or EU) and the catalog metadata is not
present in that location, the query fails.
Primary regions for multi-regions
To allow BigQuery to query your catalog tables from the US or
EU multi-region, specify US or EU as the primary region when you create
the catalog.
You can specify a multi-region (US or EU) as the primary region in the
following configurations:
- BigLake catalogs: If the
default_locationbucket is:- A
USorEUmulti-region bucket. - A single-region bucket within those multi-regions (such as
us-central1oreurope-west4). - A dual-region or custom dual-region bucket within those areas (such as
NAM4orEUR4).
- A
- Cloud Storage bucket catalogs: If the catalog's bucket is:
- A
USorEUmulti-region bucket. - A single-region bucket within those multi-regions (such as
us-central1oreurope-west4).
- A
The primary replica is defined when you create the catalog, but you can
dynamically perform a failover by calling FailoverCatalog. For more
information, see Create a catalog.
Querying catalogs from BigQuery
When querying Lakehouse runtime catalog tables from BigQuery, you use a four-part naming structure, often referred to as P.C.N.T:
- Project: The Google Cloud project ID that owns the catalog.
- Catalog: The name of the Lakehouse runtime catalog.
- Namespace: The Apache Iceberg namespace (equivalent to a BigQuery dataset).
- Table: The name of the table.
For example, my-project.lakehouse-catalog-id.my-namespace.my-table.