创建数据基础模块
您需要创建数据基础模块,才能处理补充原始表并将其纳入数据基础数据集。
创建自定义数据基础模块时,我们建议您使用专用的 自定义命名空间 来打包该模块。此外,请确保您计划处理的源表存在于原始层数据集内。
处理 SAP 数据基础模块时,请确保 DD03L 表已在原始数据集中复制,并且在 config/config.yaml 文件中配置为基础模块的来源。另请确保复制的 DD03L 表包含您计划注入的任何表(例如自定义表或补充表,如 sflight)的字段元数据记录。Cortex Framework build 脚本和依赖项解析器会读取这些元数据行,以识别列列表、数据类型以及表之间的主键关系。
数据基础模块定义
如需定义数据基础模块,请按以下步骤操作:
- 在
config/config.yaml文件中,将目标数据集配置添加到data.targets。这会准备一个专用的 BigQuery 数据集,基础表将部署到该数据集中:
[...]
data:
[...]
targets:
- id: data_foundation_sap_custom_namespace
# Google Cloud Project ID for the target dataset.
projectId: target_project_id
# BigQuery dataset ID for the target.
datasetId: data_foundation_sap_custom_namespace
- 如需定义数据基础模块,请将以下内容添加到
config/config.yaml文件中的modules.foundation部分:
[...]
data:
[...]
modules:
foundation:
[...]
- moduleId: foundation_module_id
type: cortex.sap
dataSourceId: sap_raw_s4
dataTargetId: data_foundation_sap_custom_namespace
moduleSettings:
sapVersion: s4
mandt: "100"
tableSettings: "table_settings.yaml"
# Optional. Path to custom table settings configuration relative to this config file, e.g., `config/custom_namespace_path/data_foundation/sap/table_settings.yaml`
# If omitted, defaults to src/data_modules/cortex/data_foundation/sap/table_settings.default.yaml.
- 替代方案:如果使用外部处理的 CDC 表作为数据基础,请使用
external: true调整config/config.yaml文件中的modules.foundation部分,并移除targetDataSetID
[...]
data:
[...]
modules:
foundation:
[...]
- moduleId: foundation_module_id
type: cortex.sap
dataSourceId: sap_raw_s4
external: true
moduleSettings:
sapVersion: s4
mandt: "100"
tableSettings: "table_settings.yaml"
# Optional. Path to custom table settings configuration relative to this config file, e.g., `config/custom_namespace_path/data_foundation/sap/table_settings.yaml`
# If omitted, defaults to src/data_modules/cortex/data_foundation/sap/table_settings.default.yaml.
- 创建引用的
table_settings.yaml,以定义原始层数据集中的哪些表将转换为数据基础层。
common:
- source:
tableName: custom_sap_table_name
target:
tags: [sap, s4, hourly]
clusterDetails:
columns: [carrid, connid]
partitionDetails:
column: fldate
partitionType: time
timeGrain: day
数据基础模块示例
在以下示例中,我们将使用之前定义的命名空间 custom_namespace 创建一个新的自定义数据基础模块,该模块用于处理 sflight 表。
注册新的数据基础模块。
data:
targets:
- id: data_foundation_sap_bookingdatamodel
projectId: target_project_id
datasetId: data_foundation_sap_bookingdatamodel
modules:
foundation:
- moduleId: sap_bookingdatamodel
type: sap_bookingdatamodel.sap
dataSourceId: sap_raw_s4
dataTargetId: data_foundation_sap_sap_bookingdatamodel
moduleSettings:
sapVersion: s4
mandt: "100"
tableSettings: "table_settings.yaml"
创建 table_settings.yaml
使用以下内容创建文件:config/sap_bookingdatamodel/data_foundation/sap/table_settings.yaml:
common:
- source:
tableName: sflight
target:
tags: [sap, s4, hourly]
clusterDetails:
columns: [carrid, connid]
partitionDetails:
column: fldate
partitionType: time
timeGrain: day
由于该表对于 SAP ECC 和 S/4HANA 方言具有相同的布局,因此它是 common: 部分的子元素。
为每个表创建注释文件,用于使用元数据丰富架构。
在示例场景中,我们将创建 src/data_modules/sap_bookingdatamodel/data_foundation/sap/annotations/sflight.yaml,其内容如下:
description: "transparent table FLIGHT, part of the Basis Components Module, BC-DWB-TND (Training and Demo)."
fields:
- name: "mandt"
description: "Client"
- name: "carrid"
description: "Airline Code"
- name: "connid"
description: "Flight Connection Number"
- name: "fldate"
description: "Flight date"
- name: "price"
description: "Airfare"
- name: "currency"
description: "Local currency of airline"
- name: "planetype"
description: "Aircraft Type"
- name: "seatsmax"
description: "Maximum capacity in economy class"
- name: "seatsocc"
description: "Occupied seats in economy class"
- name: "paymentsum"
description: "Total of current bookings"
- name: "seatsmax_b"
description: "Maximum capacity in business class"
- name: "seatsocc_b"
description: "Occupied seats in business class"
- name: "seatsmax_f"
description: "Maximum capacity in first class"
- name: "seatsocc_f"
description: "Occupied seats in first class"
如需验证自定义数据基础模块是否成功编译和部署,请参阅数据产品可扩展性页面中的验证部分。