Create BigLake tables for Apache Iceberg in BigQuery in BigQuery
The following sections describe how to create managed tables using BigLake tables for Apache Iceberg in BigQuery (hereafter BigLake Iceberg tables in BigQuery).
Before you begin
Understand the different types of BigLake tables and the implications of using them, in the BigLake table overview.
Before creating and using BigLake Iceberg tables in BigQuery, ensure that you have set up a Cloud resource connection to a storage bucket. Your connection needs write permissions on the storage bucket, as specified in the following Required roles section. For more information about required roles and permissions for connections, see Manage connections.
Required roles
To get the permissions that you need to let BigQuery manage tables in your project, ask your administrator to grant you the following IAM roles:
-
To create BigLake Iceberg tables in BigQuery:
-
BigQuery Data Owner (
roles/bigquery.dataOwner) on your project -
BigQuery Connection Admin (
roles/bigquery.connectionAdmin) on your project
-
BigQuery Data Owner (
-
Grant the connection service account the following roles so it can read and write data in Cloud Storage:
-
Storage Object User (
roles/storage.objectUser) on the bucket -
Storage Legacy Bucket Reader (
roles/storage.legacyBucketReader) on the bucket
-
Storage Object User (
For more information about granting roles, see Manage access to projects, folders, and organizations.
These predefined roles contain the permissions required to let BigQuery manage tables in your project. To see the exact permissions that are required, expand the Required permissions section:
Required permissions
The following permissions are required to let BigQuery manage tables in your project:
-
bigquery.connections.delegateon your project -
bigquery.jobs.createon your project -
bigquery.readsessions.createon your project -
bigquery.tables.createon your project -
bigquery.tables.geton your project -
bigquery.tables.getDataon your project -
storage.buckets.geton your bucket -
storage.objects.createon your bucket -
storage.objects.deleteon your bucket -
storage.objects.geton your bucket -
storage.objects.liston your bucket
You might also be able to get these permissions with custom roles or other predefined roles.
Create BigLake Iceberg tables in BigQuery
To create a BigLake Iceberg table in BigQuery, select one of the following methods:
SQL
CREATE TABLE [PROJECT_ID.]DATASET_ID.TABLE_NAME ( COLUMN DATA_TYPE[, ...] ) CLUSTER BY CLUSTER_COLUMN_LIST WITH CONNECTION {CONNECTION_NAME | DEFAULT} OPTIONS ( file_format = 'PARQUET', table_format = 'ICEBERG', storage_uri = 'STORAGE_URI');
Replace the following:
- PROJECT_ID: the project containing the dataset. If undefined, the command assumes the default project.
- DATASET_ID: an existing dataset.
- TABLE_NAME: the name of the table you're creating.
- DATA_TYPE: the data type of the information that is contained in the column.
- CLUSTER_COLUMN_LIST (optional): a comma-separated list containing up to four columns. They must be top-level, non-repeated columns.
CONNECTION_NAME: the name of the connection. For example,
myproject.us.myconnection.To use a default connection, specify
DEFAULTinstead of the connection string containing PROJECT_ID.REGION.CONNECTION_ID.STORAGE_URI: a fully qualified Cloud Storage URI. For example,
gs://mybucket/table.
bq
bq --project_id=PROJECT_ID mk \ --table \ --file_format=PARQUET \ --table_format=ICEBERG \ --connection_id=CONNECTION_NAME \ --storage_uri=STORAGE_URI \ --schema=COLUMN_NAME:DATA_TYPE[, ...] \ --clustering_fields=CLUSTER_COLUMN_LIST \ DATASET_ID.MANAGED_TABLE_NAME
Replace the following:
- PROJECT_ID: the project containing the dataset. If undefined, the command assumes the default project.
- CONNECTION_NAME: the name of the connection. For
example,
myproject.us.myconnection. - STORAGE_URI: a fully qualified
Cloud Storage URI.
For example,
gs://mybucket/table. - COLUMN_NAME: the column name.
- DATA_TYPE: the data type of the information contained in the column.
- CLUSTER_COLUMN_LIST (optional): a comma-separated list containing up to four columns. They must be top-level, non-repeated columns.
- DATASET_ID: the ID of an existing dataset.
- MANAGED_TABLE_NAME: the name of the table you're creating.
API
Call the
tables.insert'
method with a defined
table resource, similar to the
following:
{ "tableReference": { "tableId": "TABLE_NAME" }, "biglakeConfiguration": { "connectionId": "CONNECTION_NAME", "fileFormat": "PARQUET", "tableFormat": "ICEBERG", "storageUri": "STORAGE_URI" }, "schema": { "fields": [ { "name": "COLUMN_NAME", "type": "DATA_TYPE" } [, ...] ] } }
Replace the following:
- TABLE_NAME: the name of the table that you're creating.
- CONNECTION_NAME: the name of the connection. For
example,
myproject.us.myconnection. - STORAGE_URI: a fully qualified
Cloud Storage URI.
Wildcards
are also supported. For example,
gs://mybucket/table. - COLUMN_NAME: the column name.
- DATA_TYPE: the data type of the information contained in the column.