Use catalog federation with BigQuery

Lakehouse for Apache Iceberg supports catalog federation through the Apache Iceberg REST catalog endpoint. Catalog federation lets you use the Apache Iceberg REST catalog endpoint interface to manage and query external tables through BigQuery. Instead of creating a dedicated catalog resource, BigQuery acts as a federation gateway so that queries are routed through BigQuery to interact with your external catalogs.

Before you begin

  1. Verify that billing is enabled for your Google Cloud project.

  2. Enable the BigLake API.

    Roles required to enable APIs

    To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.

    Enable the API

Required roles

To get the permissions that you need to use catalog federation in BigQuery, ask your administrator to grant you the following IAM roles:

  • Read catalog resources and query table data:
  • Perform data manipulation language (DML) operations with BigQuery catalog federation:
    • BigQuery Data Editor (roles/bigquery.dataEditor) on the project
    • Storage Admin (roles/storage.admin) on the Cloud Storage bucket. If you use query engines such as Managed Service for Apache Spark to perform DML operations, grant these roles to the service account that you use to run jobs in that engine.

For more information about granting roles, see Manage access to projects, folders, and organizations.

You might also be able to get the required permissions through custom roles or other predefined roles.

Set up catalog federation

To enable federation, configure your client (such as Apache Spark or Trino) with the bq://projects/PROJECT_ID warehouse format in the WAREHOUSE_PATH field in the client configuration examples in Configure client application.

You can also choose to include a BigQuery location to restrict future requests to a single location using the bq://projects/PROJECT_ID/locations/LOCATION format.

Because these resources are managed by BigQuery, you must have the applicable required permissions.

Create namespaces for federated tables

After you configure your client for federation, you can create a namespace for your federated tables.

Spark

To use BigQuery catalog federation, include the LOCATION and DBPROPERTIES clauses:

spark.sql("CREATE NAMESPACE IF NOT EXISTS NAMESPACE_NAME LOCATION 'gs://BUCKET_NAME/NAMESPACE_NAME' WITH DBPROPERTIES ('gcp-region' = 'LOCATION');")
spark.sql("USE NAMESPACE_NAME;")

Replace the following:

  • NAMESPACE_NAME: a name for your namespace.
  • BUCKET_NAME: the Cloud Storage bucket that you're using with your catalog.
  • LOCATION: a BigQuery location. The default value is the US multi-region.

Trino

To use BigQuery catalog federation, include the LOCATION and gcp-region properties:

CREATE SCHEMA IF NOT EXISTS  CATALOG_NAME.SCHEMA_NAME WITH ( LOCATION = 'gs://BUCKET_NAME/SCHEMA_NAME', "gcp-region" = 'LOCATION');
USE CATALOG_NAME.SCHEMA_NAME;

Replace the following:

  • CATALOG_NAME: the name of your Trino catalog using the Apache Iceberg REST catalog endpoint.
  • SCHEMA_NAME: a name for your schema.
  • BUCKET_NAME: the Cloud Storage bucket that you're using with your catalog.
  • LOCATION: a BigQuery location. The default value is the US multi-region.

Query federated tables in BigQuery

Tables that you create under a federated catalog are visible in BigQuery and can be queried directly using standard BigQuery SQL (without needing a four-part P.C.N.T name):

SELECT * FROM `NAMESPACE_NAME.TABLE_NAME`;

Replace the following:

  • NAMESPACE_NAME: the name of your namespace.
  • TABLE_NAME: the name of your table.

What's next