Google uses AI technology to translate content into your preferred language. AI translations can contain errors.

適用於 Apache Iceberg 的 Dataflow 受管理 I/O

代管 I/O 支援 Apache Iceberg 的下列功能：

目錄	Hadoop Hive REST 型目錄 BigQuery 中繼資料存放區 (如果未使用 Runner v2，則需要 Apache Beam SDK 2.62.0 以上版本)
讀取功能	批次讀取
寫入功能	批次寫入串流寫入動態目的地建立動態資料表

如果是 Apache Iceberg 專用 BigQuery 資料表，請搭配使用 BigQueryIO 連接器和 BigQuery Storage API。資料表必須已存在，不支援動態建立資料表。

需求條件

下列 SDK 支援 Apache Iceberg 的代管 I/O：

Apache Iceberg 的代管 I/O 支援下列設定參數：

設定	類型	說明
table	`str`	Iceberg 資料表的 ID。
catalog_name	`str`	資料表所在目錄的名稱。
catalog_properties	`map[str, str]`	用於設定 Iceberg 目錄的屬性。
config_properties	`map[str, str]`	傳遞至 Hadoop 設定的屬性。
drop	`list[str]`	要排除在讀取作業之外的資料欄名稱子集。如果為空值或空白，系統會讀取所有資料欄。
篩選器	`str`	類似 SQL 的述詞，可在掃描時篩選資料。例如：「id > 5 AND status = 'ACTIVE'」。使用 Apache Calcite 語法：https://calcite.apache.org/docs/reference.html
保留	`list[str]`	要讀取的資料欄名稱子集。如果為空值或空白，系統會讀取所有資料欄。

設定	類型	說明
table	`str`	完整資料表 ID。您也可以提供範本，將資料寫入多個動態目的地，例如：`dataset.my_{col1}_{col2.nested}_table`。
catalog_name	`str`	資料表所在目錄的名稱。
catalog_properties	`map[str, str]`	用於設定 Iceberg 目錄的屬性。
config_properties	`map[str, str]`	傳遞至 Hadoop 設定的屬性。
direct_write_byte_limit	`int32`	針對串流管道，設定資料組合改用直接寫入路徑時的資料大小上限。
drop	`list[str]`	寫入前要從輸入記錄捨棄的欄位名稱清單，與「keep」和「only」互斥。
保留	`list[str]`	要保留於輸入記錄的欄位名稱清單。寫入之前，其他欄位都會捨棄。與「drop」和「only」互斥。
僅限	`str`	要寫入的單一記錄欄位名稱，與「keep」和「drop」互斥。
partition_fields	`list[str]`	用於建立分區規格的欄位，該規格會在建立資料表時套用。以欄位「foo」來說，可用的分區轉換作業包括： `foo` `truncate(foo, N)` `bucket(foo, N)` `hour(foo)` `day(foo)` `month(foo)` `year(foo)` `void(foo)` 如要進一步瞭解分區轉換，請前往 https://iceberg.apache.org/spec/#partition-transforms。
table_properties	`map[str, str]`	建立資料表時要設定的 Iceberg 資料表屬性。如要進一步瞭解資料表屬性，請前往 https://iceberg.apache.org/docs/latest/configuration/#table-properties。
triggering_frequency_seconds	`int32`	設定串流管道的快照產生頻率。

如需詳細資訊和程式碼範例，請參閱下列主題：