Apache Iceberg REST catalog concepts

This document provides an overview of the BigLake metastore Apache Iceberg REST catalog, including its resource hierarchy and supported catalog types.

The Apache Iceberg REST catalog in BigLake metastore uses a hierarchy of resources to organize your data. The following table provides a high-level look at these resources:

Resource hierarchy

Resource Description
Catalog The top-level container, a catalog lets you organize namespaces and tables into logical groups by splitting them up into different catalogs.
Namespace A logical grouping used to organize tables within a catalog, this functions like databases, schemas, or directories.
Table Tables contain definitions of rows and columns that can be queried.

Supported catalog types

When you configure your client, you specify a warehouse location. This choice determines how your catalog operates and integrates with other Google Cloud services. The following table describes supported catalog types:

Catalog Type Description
Cloud Storage bucket All data in a catalog is stored in a single Cloud Storage bucket; for data shared across multiple buckets, multiple catalogs are required.
BigQuery federation Lets you use the Iceberg REST catalog to manage and query tables that are visible to BigQuery. For more information, see Catalog federation with BigQuery.

Warehouse details

Recommended

  • Cloud Storage bucket warehouse (gs://): This is the standard approach where the catalog directly manages Iceberg metadata and data files in a Cloud Storage bucket that you specify. This option gives you direct control over your data layout and supports credential vending for fine-grained access control. This lets you create and manage BigLake tables for Apache Iceberg.

    For example, if you created your bucket to store your catalog and named it iceberg-bucket, both your catalog name and bucket name are iceberg-bucket. This is used later when you query your catalog in BigQuery, using the P.C.N.T syntax. For example my-project.biglake-catalog-id.quickstart_namespace.quickstart_table.

Alternative

  • BigQuery federation (bq://): This approach lets you use the Iceberg REST catalog to manage and query tables that are visible to BigQuery, without needing to create a catalog resource. For more information, see Catalog federation with BigQuery.

Bucket and catalog regions

For Cloud Storage bucket warehouses in BigLake metastore, the system selects the catalog region to match the underlying bucket's region:

  • Single-region buckets: The catalog region matches the bucket region exactly.
  • Dual-region buckets: Includes predefined and user-defined dual regions, such as ASIA1 and NAM4. The catalog region matches the dual regions.
  • Multi-region buckets: The system selects regional locations for the catalog within the multi-region's geographic domain. By default, these locations might not match common BigQuery locations like US and EU. Instead, they are regional locations within the geographic domain (for example, us-central1 and us-east4 for a US multi-region bucket).

When BigQuery runs a query over tables in these catalogs, BigQuery routes the query to the region in the catalog's primary region. If you run a query in a specific virtual region (like US or EU) and the catalog metadata isn't present in that location, the query might fail.

Specify primary regions for US and EU multi-regions

For catalogs that use a US or EU multi-region bucket, you can specify the primary region when you create the catalog to ensure that BigQuery can access it from the corresponding regions.

  • Cloud Storage EU multi-region: Specify EU or europe-west4.
  • Cloud Storage US multi-region: Specify US or us-central1.

The system selects a catalog's primary replica when you create it, but you can dynamically update it by calling FailoverCatalog. For more information about defining primary locations, see Use the BigLake metastore Iceberg REST catalog.

Querying catalogs

When querying BigLake metastore tables from BigQuery, you use a four-part naming structure, often referred to as P.C.N.T:

  • Project: The Google Cloud project ID that owns the catalog.
  • Catalog: The name of the BigLake metastore catalog.
  • Namespace: The Iceberg namespace (equivalent to a BigQuery dataset).
  • Table: The name of the table.

For example, my-project.biglake-catalog-id.my-namespace.my-table.

Catalog federation with BigQuery

You can use the Iceberg REST catalog interface to manage and query tables that are visible to BigQuery. BigQuery federation catalogs don't require you to create a catalog resource; they can be used in any project that has the BigQuery API enabled. This lets you:

Because these resources are managed by BigQuery, you must have the applicable required permissions. Credential vending isn't supported for federated catalogs.

To enable federation, configure your client with the bq://projects/PROJECT_ID warehouse format in the WAREHOUSE_PATH field in the client configuration examples in Use the Iceberg REST catalog. You can also choose to include a BigQuery location to restrict future requests to a single location using the bq://projects/PROJECT_ID/locations/LOCATION format.