Create BigLake tables for Apache Iceberg in BigQuery in BigQuery

The following sections describe how to create managed tables using BigLake tables for Apache Iceberg in BigQuery (hereafter BigLake Iceberg tables in BigQuery).

Before you begin

Understand the different types of BigLake tables and the implications of using them, in the BigLake table overview.
Before creating and using BigLake Iceberg tables in BigQuery, ensure that you have set up a Cloud resource connection to a storage bucket. Your connection needs write permissions on the storage bucket, as specified in the following Required roles section. For more information about required roles and permissions for connections, see Manage connections.

Required roles

To get the permissions that you need to let BigQuery manage tables in your project, ask your administrator to grant you the following IAM roles:

To create BigLake Iceberg tables in BigQuery:
- BigQuery Data Owner (roles/bigquery.dataOwner) on your project
- BigQuery Connection Admin (roles/bigquery.connectionAdmin) on your project
Grant the connection service account the following roles so it can read and write data in Cloud Storage:
- Storage Object User (roles/storage.objectUser) on the bucket
- Storage Legacy Bucket Reader (roles/storage.legacyBucketReader) on the bucket

For more information about granting roles, see Manage access to projects, folders, and organizations.

These predefined roles contain the permissions required to let BigQuery manage tables in your project. To see the exact permissions that are required, expand the Required permissions section:

Required permissions

The following permissions are required to let BigQuery manage tables in your project:

bigquery.connections.delegate on your project
bigquery.jobs.create on your project
bigquery.readsessions.create on your project
bigquery.tables.create on your project
bigquery.tables.get on your project
bigquery.tables.getData on your project
storage.buckets.get on your bucket
storage.objects.create on your bucket
storage.objects.delete on your bucket
storage.objects.get on your bucket
storage.objects.list on your bucket

You might also be able to get these permissions with custom roles or other predefined roles.

Create BigLake Iceberg tables in BigQuery

To create a BigLake Iceberg table in BigQuery, select one of the following methods:

SQL

CREATE TABLE [PROJECT_ID.]DATASET_ID.TABLE_NAME (
COLUMN DATA_TYPE[, ...]
)
CLUSTER BY CLUSTER_COLUMN_LIST
WITH CONNECTION {CONNECTION_NAME | DEFAULT}
OPTIONS (
file_format = 'PARQUET',
table_format = 'ICEBERG',
storage_uri = 'STORAGE_URI');

Replace the following:

PROJECT_ID: the project containing the dataset. If undefined, the command assumes the default project.
DATASET_ID: an existing dataset.
TABLE_NAME: the name of the table you're creating.
DATA_TYPE: the data type of the information that is contained in the column.
CLUSTER_COLUMN_LIST (optional): a comma-separated list containing up to four columns. They must be top-level, non-repeated columns.
CONNECTION_NAME: the name of the connection. For example, myproject.us.myconnection.

To use a default connection, specify DEFAULT instead of the connection string containing PROJECT_ID.REGION.CONNECTION_ID.
STORAGE_URI: a fully qualified Cloud Storage URI. For example, gs://mybucket/table.

bq

bq --project_id=PROJECT_ID mk \
    --table \
    --file_format=PARQUET \
    --table_format=ICEBERG \
    --connection_id=CONNECTION_NAME \
    --storage_uri=STORAGE_URI \
    --schema=COLUMN_NAME:DATA_TYPE[, ...] \
    --clustering_fields=CLUSTER_COLUMN_LIST \
    DATASET_ID.MANAGED_TABLE_NAME

Replace the following:

PROJECT_ID: the project containing the dataset. If undefined, the command assumes the default project.
CONNECTION_NAME: the name of the connection. For example, myproject.us.myconnection.
STORAGE_URI: a fully qualified Cloud Storage URI. For example, gs://mybucket/table.
COLUMN_NAME: the column name.
DATA_TYPE: the data type of the information contained in the column.
CLUSTER_COLUMN_LIST (optional): a comma-separated list containing up to four columns. They must be top-level, non-repeated columns.
DATASET_ID: the ID of an existing dataset.
MANAGED_TABLE_NAME: the name of the table you're creating.

API

Call the tables.insert' method with a defined table resource, similar to the following:

{
"tableReference": {
  "tableId": "TABLE_NAME"
},
"biglakeConfiguration": {
  "connectionId": "CONNECTION_NAME",
  "fileFormat": "PARQUET",
  "tableFormat": "ICEBERG",
  "storageUri": "STORAGE_URI"
},
"schema": {
  "fields": [
    {
      "name": "COLUMN_NAME",
      "type": "DATA_TYPE"
    }
    [, ...]
  ]
}
}

Replace the following:

TABLE_NAME: the name of the table that you're creating.
CONNECTION_NAME: the name of the connection. For example, myproject.us.myconnection.
STORAGE_URI: a fully qualified Cloud Storage URI. Wildcards are also supported. For example, gs://mybucket/table.
COLUMN_NAME: the column name.
DATA_TYPE: the data type of the information contained in the column.

Create BigLake tables for Apache Iceberg in BigQuery in BigQuery

Before you begin

Required roles

Required permissions

Create BigLake Iceberg tables in BigQuery

SQL

bq

API

What's next