Lakehouse for Apache Iceberg provides interoperability with BigQuery through a feature known as BigQuery catalog federation. This feature lets you expose tables managed by BigQuery, for example Iceberg managed tables, to external open source (OSS) engines such as Apache Spark and Trino.
Instead of creating a dedicated Lakehouse catalog container to
store metadata, the Apache Iceberg REST catalog endpoint acts purely as a proxy
gateway. When external engines connect using the bq:// warehouse prefix, the
gateway routes their catalog requests directly to BigQuery's internal catalog.
This lets you create and manage tables directly within BigQuery
using standard BigQuery DDL or APIs, while giving external OSS engines read-only
access to query those tables through the REST catalog endpoint.
How BigQuery catalog federation works
BigQuery catalog federation lets you expose BigQuery tables—such as Iceberg managed tables and BigQuery metastore tables—through the Lakehouse runtime catalog Apache Iceberg REST catalog endpoint.
The BigQuery catalog federation flow works as follows:
- Create a table in the BigQuery catalog: You create an Iceberg managed table in BigQuery using DDL statements. This table exists in the BigQuery catalog, is governed by BigQuery access control lists (ACLs), and functions as a BigQuery REST resource.
- Federate into the BigQuery warehouse from the
Lakehouse runtime catalog: Using the
Lakehouse runtime catalog Apache Iceberg REST API, you federate
into a warehouse specified by the
bq://projects/PROJECT_IDwarehouse path format (or the regional version,bq://projects/PROJECT_ID/locations/LOCATION). This lets you access the BigQuery table from compute engines like Apache Spark through the Lakehouse for Apache Iceberg API. In this configuration, you get a read-only experience from Spark, but a read-write experience from BigQuery.
Comparison with tables managed by the Apache Iceberg REST catalog endpoint
BigQuery catalog federation differs from workflows where the Lakehouse runtime catalog uses the Apache Iceberg REST catalog endpoint in the following ways:
- Resource management and catalog storage: Federated tables reside in the BigQuery catalog as BigQuery REST resources, and the Lakehouse runtime catalog acts as a proxy gateway. When the Lakehouse runtime catalog uses the Apache Iceberg REST catalog endpoint, tables are stored directly within the catalog as Lakehouse for Apache Iceberg REST resources.
- Access control: Federated tables use BigQuery IAM permissions and access control lists (ACLs). When the Lakehouse runtime catalog uses the Apache Iceberg REST catalog endpoint, tables use Lakehouse for Apache Iceberg ACLs.
- Engine read and write capabilities: Federated tables provide read-write access through BigQuery, but read-only access from external engines such as Spark. When the Lakehouse runtime catalog uses the Apache Iceberg REST catalog endpoint, tables support read-write operations from both BigQuery APIs and external engines such as Spark.
Before you begin
-
Verify that billing is enabled for your Google Cloud project.
-
Enable the BigLake API.
Roles required to enable APIs
To enable APIs, you need the Service Usage Admin IAM role (
roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enablepermission. Learn how to grant roles.
Required roles
To get the permissions that you need to use catalog federation in BigQuery, ask your administrator to grant you the following IAM roles:
-
Read catalog resources and query table data:
- BigLake Viewer (
roles/biglake.viewer) on the project - Storage Object Viewer (
roles/storage.objectViewer) on the Cloud Storage bucket
- BigLake Viewer (
-
Perform data manipulation language (DML) operations with BigQuery catalog federation:
- BigQuery Data Editor (
roles/bigquery.dataEditor) on the project - Storage Admin (
roles/storage.admin) on the Cloud Storage bucket. If you use query engines such as Managed Service for Apache Spark to perform DML operations, grant these roles to the service account that you use to run jobs in that engine.
- BigQuery Data Editor (
For more information about granting roles, see Manage access to projects, folders, and organizations.
You might also be able to get the required permissions through custom roles or other predefined roles.
Set up BigQuery catalog federation
To enable BigQuery catalog federation, configure your client (such as Apache Spark or Trino) with the
bq://projects/PROJECT_ID warehouse format in
the WAREHOUSE_PATH field in the client configuration examples in
Configure client application.
You can also choose to include a BigQuery location to
restrict future requests to a single location using the
bq://projects/PROJECT_ID/locations/LOCATION
format.
Because these resources are managed by BigQuery, you must have the applicable required permissions.
Create namespaces for federated tables
After you configure your client for BigQuery catalog federation, you can create a namespace for your federated tables.
Spark
To use BigQuery catalog federation,
include the LOCATION and DBPROPERTIES clauses:
spark.sql("CREATE NAMESPACE IF NOT EXISTS NAMESPACE_NAME LOCATION 'gs://BUCKET_NAME/NAMESPACE_NAME' WITH DBPROPERTIES ('gcp-region' = 'LOCATION');") spark.sql("USE NAMESPACE_NAME;")
Replace the following:
NAMESPACE_NAME: a name for your namespace.BUCKET_NAME: the Cloud Storage bucket that you're using with your catalog.LOCATION: a BigQuery location. The default value is theUSmulti-region.
Trino
To use BigQuery catalog federation,
include the LOCATION and gcp-region properties:
CREATE SCHEMA IF NOT EXISTS CATALOG_NAME.SCHEMA_NAME WITH ( LOCATION = 'gs://BUCKET_NAME/SCHEMA_NAME', "gcp-region" = 'LOCATION'); USE CATALOG_NAME.SCHEMA_NAME;
Replace the following:
CATALOG_NAME: the name of your Trino catalog using the Apache Iceberg REST catalog endpoint.SCHEMA_NAME: a name for your schema.BUCKET_NAME: the Cloud Storage bucket that you're using with your catalog.LOCATION: a BigQuery location. The default value is theUSmulti-region.
Query federated tables in BigQuery
Tables that you create under a federated catalog are visible in BigQuery and can be queried directly using standard BigQuery SQL (without needing a four-part P.C.N.T name):
SELECT * FROM `NAMESPACE_NAME.TABLE_NAME`;
Replace the following:
NAMESPACE_NAME: the name of your namespace.TABLE_NAME: the name of your table.
What's next
- Learn how to manage catalogs in the Google Cloud console.
- Learn about Apache Iceberg tables supported by the Lakehouse runtime catalog.