To write from Dataflow to Apache Iceberg using the Lakehouse for Apache Iceberg REST Catalog, use the managed I/O connector.
Managed I/O supports the following capabilities for Apache Iceberg:
| Catalogs |
|
|---|---|
| Read capabilities | Batch read |
| Write capabilities |
|
For BigQuery tables for Apache Iceberg,
use the
BigQueryIO connector
with BigQuery Storage API. The table must already exist; dynamic table creation is
not supported.
Prerequisites
Set up Lakehouse for Apache Iceberg. Configure your Google Cloud project with the required permissions by following Use the Lakehouse runtime catalog with the Iceberg REST catalog. Make sure that you understand the limitations of Lakehouse for Apache Iceberg Iceberg REST Catalog described on that page.
Dependencies
Add the following dependencies to your project:
Java
<dependency>
<groupId>org.apache.beam</groupId>
<artifactId>beam-sdks-java-managed</artifactId>
<version>${beam.version}</version>
</dependency>
<dependency>
<groupId>org.apache.beam</groupId>
<artifactId>beam-sdks-java-io-iceberg</artifactId>
<version>${beam.version}</version>
</dependency>
<dependency>
<groupId>org.apache.iceberg</groupId>
<artifactId>iceberg-gcp</artifactId>
<version>${iceberg.version}</version>
</dependency>
Example
The following example demonstrates a streaming pipeline that writes data to an Apache Iceberg table using the REST catalog, backed by the Lakehouse runtime catalog.
Java
To authenticate to Dataflow, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
What's next
- CDC Read from Apache Iceberg with Lakehouse for Apache Iceberg REST Catalog.
- Learn more about Managed I/O.
- Learn more about Lakehouse for Apache Iceberg REST Catalog.