Create BigLake tables for Apache Iceberg in BigQuery in BigQuery

The following sections describe how to create managed tables using BigLake tables for Apache Iceberg in BigQuery (hereafter BigLake Iceberg tables in BigQuery).

Before you begin

  • Understand the different types of BigLake tables and the implications of using them, in the BigLake table overview.

  • Before creating and using BigLake Iceberg tables in BigQuery, ensure that you have set up a Cloud resource connection to a storage bucket. Your connection needs write permissions on the storage bucket, as specified in the following Required roles section. For more information about required roles and permissions for connections, see Manage connections.

Required roles

To get the permissions that you need to let BigQuery manage tables in your project, ask your administrator to grant you the following IAM roles:

For more information about granting roles, see Manage access to projects, folders, and organizations.

These predefined roles contain the permissions required to let BigQuery manage tables in your project. To see the exact permissions that are required, expand the Required permissions section:

Required permissions

The following permissions are required to let BigQuery manage tables in your project:

  • bigquery.connections.delegate on your project
  • bigquery.jobs.create on your project
  • bigquery.readsessions.create on your project
  • bigquery.tables.create on your project
  • bigquery.tables.get on your project
  • bigquery.tables.getData on your project
  • storage.buckets.get on your bucket
  • storage.objects.create on your bucket
  • storage.objects.delete on your bucket
  • storage.objects.get on your bucket
  • storage.objects.list on your bucket

You might also be able to get these permissions with custom roles or other predefined roles.

Create BigLake Iceberg tables in BigQuery

To create a BigLake Iceberg table in BigQuery, select one of the following methods:

SQL

CREATE TABLE [PROJECT_ID.]DATASET_ID.TABLE_NAME (
COLUMN DATA_TYPE[, ...]
)
CLUSTER BY CLUSTER_COLUMN_LIST
WITH CONNECTION {CONNECTION_NAME | DEFAULT}
OPTIONS (
file_format = 'PARQUET',
table_format = 'ICEBERG',
storage_uri = 'STORAGE_URI');

Replace the following:

  • PROJECT_ID: the project containing the dataset. If undefined, the command assumes the default project.
  • DATASET_ID: an existing dataset.
  • TABLE_NAME: the name of the table you're creating.
  • DATA_TYPE: the data type of the information that is contained in the column.
  • CLUSTER_COLUMN_LIST (optional): a comma-separated list containing up to four columns. They must be top-level, non-repeated columns.
  • CONNECTION_NAME: the name of the connection. For example, myproject.us.myconnection.

    To use a default connection, specify DEFAULT instead of the connection string containing PROJECT_ID.REGION.CONNECTION_ID.

  • STORAGE_URI: a fully qualified Cloud Storage URI. For example, gs://mybucket/table.

bq

bq --project_id=PROJECT_ID mk \
    --table \
    --file_format=PARQUET \
    --table_format=ICEBERG \
    --connection_id=CONNECTION_NAME \
    --storage_uri=STORAGE_URI \
    --schema=COLUMN_NAME:DATA_TYPE[, ...] \
    --clustering_fields=CLUSTER_COLUMN_LIST \
    DATASET_ID.MANAGED_TABLE_NAME

Replace the following:

  • PROJECT_ID: the project containing the dataset. If undefined, the command assumes the default project.
  • CONNECTION_NAME: the name of the connection. For example, myproject.us.myconnection.
  • STORAGE_URI: a fully qualified Cloud Storage URI. For example, gs://mybucket/table.
  • COLUMN_NAME: the column name.
  • DATA_TYPE: the data type of the information contained in the column.
  • CLUSTER_COLUMN_LIST (optional): a comma-separated list containing up to four columns. They must be top-level, non-repeated columns.
  • DATASET_ID: the ID of an existing dataset.
  • MANAGED_TABLE_NAME: the name of the table you're creating.

API

Call the tables.insert' method with a defined table resource, similar to the following:

{
"tableReference": {
  "tableId": "TABLE_NAME"
},
"biglakeConfiguration": {
  "connectionId": "CONNECTION_NAME",
  "fileFormat": "PARQUET",
  "tableFormat": "ICEBERG",
  "storageUri": "STORAGE_URI"
},
"schema": {
  "fields": [
    {
      "name": "COLUMN_NAME",
      "type": "DATA_TYPE"
    }
    [, ...]
  ]
}
}

Replace the following:

  • TABLE_NAME: the name of the table that you're creating.
  • CONNECTION_NAME: the name of the connection. For example, myproject.us.myconnection.
  • STORAGE_URI: a fully qualified Cloud Storage URI. Wildcards are also supported. For example, gs://mybucket/table.
  • COLUMN_NAME: the column name.
  • DATA_TYPE: the data type of the information contained in the column.

What's next