Your use case might require you to connect an external Iceberg REST Catalog (IRC) table to an existing Google Cloud Lakehouse table. Dataflow's job builder UI lets you build a pipeline that migrates your external open source Iceberg catalog tables into Lakehouse in a low-code or no-code way. This process lets you consolidate data into a unified Lakehouse-managed Iceberg format for cross-engine analytics.
Use the following connection details to import data from external Iceberg catalogs.
Before you begin
To import data, you need the following:
- Connection information for the external Iceberg REST Catalog. For example: catalog name, namespace, table name, account URI, and role to access the catalog.
- A Lakehouse Iceberg catalog, namespace, and table to import the data into.
Support and limitations
Importing data from external Iceberg catalogs to Google Cloud Lakehouse using Dataflow has the following limitations:
- This feature supports reading from externally available Iceberg providers that support IRC (Iceberg Rest Catalog) into Lakehouse. Other Iceberg catalog types aren't supported.
- This feature supports batch and streaming pipelines.
Import an external Iceberg catalog table
To import an external Iceberg catalog table into Google Cloud Lakehouse, complete the following steps:
In the Google Cloud console, go to the Google Cloud Lakehouse Metastore page.
Select the catalog, namespace, and table you want to import data into.
On the Table details page, click Import table.
In the Import configuration dialog, select Import a table from an Apache Iceberg REST Catalog into Lakehouse (Batch).
The Dataflow Job builder page opens.
In the Sources section:
To expand the Iceberg table source panel, click the expander arrow.
In the Iceberg table field, enter the identifier of the Apache Iceberg table.
In the Catalog name field, enter the name of the catalog.
In the Filter field, enter the Iceberg filter to use. For example,
id > 5.Optional: To specify source table column changes, use the Keep columns or Drop columns sections.
In the Catalog type list of the Catalog properties section, select the type of catalog.
In the Catalog URI field, enter the URI of the catalog. For example,
http://localhost:8181.In the Warehouse name field, enter the catalog name.
For some external Iceberg REST Catalog providers, the warehouse is abstracted, and the catalog name is provided as the warehouse name.
In the Authentication type list, select the authentication type. For example,
OAUTH2.
Optional: In the Transforms section, add any transforms to the source data.
In the Sink section:
- Optional: Review the Lakehouse table sink panel. The information in this panel, such as the Lakehouse table, catalog name, and warehouse location, is typically prepopulated.
In the Dataflow options section, click Run job.
What's next
- Learn more about how to Create a custom job with the job builder UI.
- Learn more in the Introduction to Google Cloud Lakehouse tables for Apache Iceberg in BigQuery.
- Read the blog post Lakehouse evolved: Build open, high-performance, enterprise Iceberg-native lakehouses.