Lakehouse for Apache Iceberg supports catalog federation and direct table queries through the Apache Iceberg REST catalog endpoint. Configuring federation within the Lakehouse runtime catalog exposes Apache Iceberg tables directly to BigQuery.
Before you begin
-
Verify that billing is enabled for your Google Cloud project.
-
Enable the BigLake API.
Roles required to enable APIs
To enable APIs, you need the Service Usage Admin IAM role (
roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enablepermission. Learn how to grant roles.
Required roles
To get the permissions that you need to query tables and use catalog federation in BigQuery, ask your administrator to grant you the following IAM roles:
-
Read catalog resources and query table data:
- BigLake Viewer (
roles/biglake.viewer) on the project - Storage Object Viewer (
roles/storage.objectViewer) on the Cloud Storage bucket
- BigLake Viewer (
-
Perform data manipulation language (DML) operations with BigQuery catalog federation:
- BigQuery Data Editor (
roles/bigquery.dataEditor) on the project - Storage Admin (
roles/storage.admin) on the Cloud Storage bucket. If you use query engines such as Managed Service for Apache Spark to perform DML operations, grant these roles to the service account that you use to run jobs in that engine.
- BigQuery Data Editor (
For more information about granting roles, see Manage access to projects, folders, and organizations.
You might also be able to get the required permissions through custom roles or other predefined roles.
Query tables in BigQuery
How you query tables that you create through the Apache Iceberg REST catalog endpoint in BigQuery depends on whether you are using a Cloud Storage bucket warehouse or BigQuery federation.
- Cloud Storage bucket warehouse: If you configured your client
with a
gs://warehouse path, query tables from BigQuery using the four-part name (P.C.N.T)project.catalog.namespace.table. For more information on P.C.N.T structure, see Iceberg REST catalog concepts. Thecatalogcomponent is the name of your Lakehouse runtime catalog resource. For more information on querying tables, see Query a table. BigQuery federation: If you configured your client with a
bq://warehouse path, tables that you create are visible in BigQuery and can be queried directly using standard BigQuery SQL:SELECT * FROM `NAMESPACE_NAME.TABLE_NAME`;
Replace the following:
NAMESPACE_NAME: the name of your namespace.TABLE_NAME: the name of your table.
Use catalog federation with BigQuery
To learn about catalog federation, see Iceberg REST catalog concepts.
To enable federation, configure your client with the
bq://projects/PROJECT_ID warehouse format in
the WAREHOUSE_PATH field in the client configuration examples in
Configure client application.
You can also choose to include a BigQuery location to
restrict future requests to a single location using the
bq://projects/PROJECT_ID/locations/LOCATION
format.
Because these resources are managed by BigQuery, you must have the applicable required permissions.
After you configure your client for federation, you can create a namespace for your federated tables.
Spark
To use BigQuery catalog federation,
include the LOCATION and DBPROPERTIES clauses:
spark.sql("CREATE NAMESPACE IF NOT EXISTS NAMESPACE_NAME LOCATION 'gs://BUCKET_NAME/NAMESPACE_NAME' WITH DBPROPERTIES ('gcp-region' = 'LOCATION');") spark.sql("USE NAMESPACE_NAME;")
Replace the following:
NAMESPACE_NAME: a name for your namespace.BUCKET_NAME: the Cloud Storage bucket that you're using with your catalog.LOCATION: a BigQuery location. The default value is theUSmulti-region.
Trino
To use BigQuery catalog federation,
include the LOCATION and gcp-region properties:
CREATE SCHEMA IF NOT EXISTS CATALOG_NAME.SCHEMA_NAME WITH ( LOCATION = 'gs://BUCKET_NAME/SCHEMA_NAME', "gcp-region" = 'LOCATION'); USE CATALOG_NAME.SCHEMA_NAME;
Replace the following:
CATALOG_NAME: the name of your Trino catalog using the Apache Iceberg REST catalog endpoint.SCHEMA_NAME: a name for your schema.BUCKET_NAME: the Cloud Storage bucket that you're using with your catalog.LOCATION: a BigQuery location. The default value is theUSmulti-region.
What's next
Learn how to manage catalogs in the Google Cloud console.
Learn about Lakehouse REST catalog tables for Apache Iceberg.