As of April 20th, 2026, BigLake is now called Lakehouse for Apache Iceberg. BigLake metastore is now called the Lakehouse runtime catalog. Lakehouse APIs, client libraries, CLI commands, and IAM names remain unchanged and still reference BigLake.

Import Parquet files in storage into Lakehouse runtime catalog using Dataflow

You can use a Dataflow job builder blueprint to add existing Apache Parquet files from cloud-based storage (Cloud Storage or Amazon S3) to an Apache Iceberg table in Lakehouse.

This process uses the IcebergAddFiles transform. If your Parquet files are in Cloud Storage, this transform registers the files with Lakehouse without moving or rewriting the underlying data. If your files are in an external storage system like Amazon S3, they are copied to Cloud Storage for faster querying through Lakehouse, and then registered.

Use the following connection details to add Parquet files from cloud-based storage to an Apache Iceberg table in Lakehouse.

Before you begin

Enable the Dataflow, BigQuery, and Lakehouse APIs.
To get the permissions that you need to create the resources, ask your administrator to grant you the required Identity and Access Management (IAM) roles on your project.
Create a Lakehouse for Apache Iceberg catalog, namespace, and table to import data into.
Create a cloud-based storage bucket (Cloud Storage or Amazon S3) and upload your Parquet files to the bucket.
If the cloud-based storage bucket you're using isn't Google's Cloud Storage, create a Cloud Storage bucket to store your job error logs.

Support and limitations

Importing Parquet files in cloud-based storage to Lakehouse for Apache Iceberg using Dataflow has the following limitations:

The source data must be in Apache Parquet format and stored in Cloud Storage or Amazon S3.
This feature supports only batch pipelines.

Import Parquet files to Lakehouse

Use the following steps to import Parquet files from cloud-based storage to an Iceberg table in Lakehouse using the Dataflow job builder UI.

In the Google Cloud console, go to the Lakehouse for Apache Iceberg page.

Go to Lakehouse
Select the catalog, namespace, and table that you want to import data into.
On the Table details page, click Import table.
In the Import configuration dialog, select Import a table from an Apache Parquet files into Lakehouse (Batch).

The Dataflow Job builder page opens.
In the Sources section:
1. Open the CreateGlobalInput source entry that is already created.
2. In the YAML source configuration editor section, enter one or more paths to your Parquet files in the elements sequence.
  
  To improve import efficiency, specify multiple sets of files (globs) when you're registering a large number of files. For example:
```
reshuffle: true
elements:
  -   gs://BUCKET_NAME/restaurant-data/2023/*.parquet
  -   gs://BUCKET_NAME/restaurant-data/2024/*.parquet
```
3. Click Done.
In the Transforms section:
1. Click the IcebergAddFiles transform section to open it.
2. In the Iceberg table field, enter the namespace and table name. For example: NAMESPACE .TABLE_NAME.
3. Under Catalog properties, configure the following items:
  1. warehouse: The Cloud Storage location of your catalog. For example, gs://CATALOG_PATH.
  2. header.x-goog-user-project: Your Google Cloud project ID: PROJECT_ID.
  3. Click Done.
4. If you are migrating from S3, you need to provide additional configurations to copy Parquet files to Cloud Storage. This is not required if your files are already in Cloud Storage.
  1. Click the CopyFilesToGCS transform section to open it.
  2. Set the value of the gcs_file_path configuration parameter to provide the fully qualified Cloud Storage bucket to copy temporary files to. We recommend using the same Cloud Storage bucket used by the Lakehouse warehouse.
  3. Click Done.
  1 Click the Dataflow Options section to open it.
  1. Click add additional pipeline options to provide Apache Beam pipeline options related to S3. For example, s3_region_name, s3_access_key_id, s3_secret_access_key, and their corresponding values.
In the Sinks section:
1. Click the Write results sink to open it.
2. In the JSON location field, specify the Cloud Storage location and filename to write error results. For example:
```
gs://BUCKET_NAME/errors/errors.json
```
3. Click Done.
In the Dataflow Options section, click Run job.

If you need to further customize the Dataflow pipeline used to register Parquet files, you can do that using the job builder form or the YAML editor.

Examine the job output

After the job completes, you can verify that the data was registered with the Iceberg table by querying it in BigQuery.

In the Dataflow job list, check that the job status is Succeeded.

Go to Jobs
If the job fails or has errors, check the JSON error log file in Cloud Storage for details.

Go to Buckets
In the Google Cloud console, go to the BigQuery Studio page.

Go to BigQuery
In the query editor, enter a SQL query to inspect the table. You can use PROJECT_ID.CATALOG>NAMESPACE.TABLE_NAME convention to query.
```
SELECT * FROM `PROJECT_ID.CATALOG>NAMESPACE.TABLE_NAME` LIMIT 10
```
Click Run.
Review the Query results to ensure the data was processed correctly.

What's next

Learn more in About Lakehouse runtime catalog.
Learn more in the Dataflow Job builder UI overview.

Import Parquet files in storage into Lakehouse runtime catalog using Dataflow Stay organized with collections Save and categorize content based on your preferences.