Manage Apache Iceberg tables

Lakehouse for Apache Iceberg supports creating and managing Apache Iceberg tables in Cloud Storage using the Lakehouse runtime catalog.

Managed through the Lakehouse runtime catalog, these tables are readable by BigQuery and connected open source engines. Write operations remain restricted to the specific engine that originally created the table.

Before you begin

See the table overview to understand the different types of tables and the implications of using them.

  1. Verify that billing is enabled for your Google Cloud project.

  2. Enable the BigLake API.

    Roles required to enable APIs

    To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.

    Enable the API

Required roles

To get the permissions that you need to manage Apache Iceberg tables supported by the Lakehouse runtime catalog, ask your administrator to grant you the following IAM roles:

For more information about granting roles, see Manage access to projects, folders, and organizations.

You might also be able to get the required permissions through custom roles or other predefined roles.

Create a table

Spark

spark.sql("CREATE TABLE NAMESPACE_NAME.TABLE_NAME (id int, data string) USING ICEBERG;")

spark.sql("DESCRIBE NAMESPACE_NAME.TABLE_NAME").show()

Replace the following values:

  • NAMESPACE_NAME: the name of your namespace
  • TABLE_NAME: a name for your table

Trino

CREATE TABLE SCHEMA_NAME.TABLE_NAME (id int, data varchar);

DESCRIBE SCHEMA_NAME.TABLE_NAME;

Replace the following values:

  • SCHEMA_NAME: the name of your schema
  • TABLE_NAME: a name for your table

If you use BigQuery catalog federation with Trino, you can't specify a location for the table. The schema's default location is always used.

List tables

Spark

spark.sql("SHOW TABLES").show()

Trino

SHOW TABLES;

Insert data into the table

Insert sample data into the table:

Spark

spark.sql("INSERT INTO TABLE_NAME VALUES (1, \"first row\"), (2, \"second row\"), (3, \"third row\");")

Trino

INSERT INTO TABLE_NAME VALUES (1, 'first row'), (2, 'second row'), (3, 'third row');

Query a table

Select all data from the table:

Spark

spark.sql("SELECT * FROM TABLE_NAME;").show()

Trino

SELECT * FROM TABLE_NAME;

BigQuery

To query Apache Iceberg tables supported by the Lakehouse runtime catalog from BigQuery, use the four-part table name in your query with the following format: PROJECT_NAME.CATALOG_ID.NAMESPACE_OR_SCHEMA_NAME.TABLE_NAME.

SELECT * FROM `PROJECT_NAME.BIGLAKE_CATALOG_ID.NAMESPACE_OR_SCHEMA_NAME.TABLE_NAME`;

Replace the following values:

  • PROJECT_NAME: the Google Cloud project that owns the catalog in the Lakehouse runtime catalog. The selected Google Cloud project in the Google Cloud console is billed for the query.

  • CATALOG_ID: the ID of the Lakehouse runtime catalog specified when the catalog was created. This identifier is used as the catalog name in BigQuery queries.

    This identifier is also the name of your Cloud Storage bucket.

    For example, if you created your bucket to store your catalog and named it iceberg-bucket, both your catalog name and bucket name are iceberg-bucket. This is used later when you query your catalog in BigQuery, using the P.C.N.T syntax. For example my-project.biglake_catalog_id.quickstart_namespace.quickstart_table.

  • NAMESPACE_OR_SCHEMA_NAME: the table namespace if using Spark or table schema name if using Trino.

  • TABLE_NAME: the name of your table.

Alter a table

Add a column to the table:

Spark

spark.sql("ALTER TABLE TABLE_NAME ADD COLUMNS ( desc string);")
spark.sql("DESCRIBE NAMESPACE_NAME.TABLE_NAME").show()

Trino

ALTER TABLE TABLE_NAME ADD COLUMN desc varchar;
DESCRIBE SCHEMA_NAME.TABLE_NAME;

Delete a table

Spark

spark.sql("DROP TABLE TABLE_NAME;")

Trino

DROP TABLE TABLE_NAME;

What's next