创建数据基础模块

只有在需要定义自定义 build 行为或将支持范围扩展到不受开箱即用支持的新源系统(例如 Salesforce)时,才需要创建自定义数据基础模块。

在开始之前,请确保您计划处理的源表存在于原始层数据集中。

处理 SAP 数据基础模块时,请确保复制的 DD03L 表包含您计划注入的任何表(包括自定义表或补充表)所需的元数据记录。如需了解详细要求,请参阅 SAP ERP 元数据要求

创建自定义数据基础模块或命名空间时,我们建议使用专用自定义命名空间,以便将扩展程序和自定义项与 Cortex Framework 成品分开,从而改进生命周期管理。

创建新的数据基础模块

如需定义数据基础模块,请按以下步骤操作:

  • config/config.yaml 文件中,将目标数据集配置添加到 data.targets。此命令将准备一个专用 BigQuery 数据集,基础表将部署到该数据集中:
data:
  targets:
    - id: data_foundation_sap_custom_namespace
      # Google Cloud Project ID for the target dataset.
      projectId:  target_project_id 
      # BigQuery dataset ID for the target.
      datasetId: data_foundation_sap_custom_namespace
  • 如需定义数据基础模块,请将以下内容添加到 config/config.yaml 文件中的 modules.foundation 部分:
data:
  modules:
    foundation:
        # Recommended naming for foundation_module_id:
        # custom_namespace_data_foundation_module_type
      - moduleId: foundation_module_id
        type: cortex.sap
        dataSourceId: sap_raw_s4
        dataTargetId: data_foundation_sap_custom_namespace
        moduleSettings:
          sapVersion: s4
          mandt: "100"
        # Custom table settings file, relative to 'config/' directory
        # If omitted, defaults to "../src/data_modules/custom_namespace/data_foundation/data_foundation_module_type/table_settings.default.yaml"
        # tableSettings: "custom_namespace/data_foundation/data_foundation_module_type/table_settings.yaml"
  • 替代方案:如果使用外部处理的 CDC 表作为数据基础,请使用 external: true 调整 config/config.yaml 文件中的 modules.foundation 部分,并移除 dataTargetId
data:
  modules:
    foundation:
      - moduleId:  foundation_module_id 
        type: cortex.sap
        dataSourceId: sap_raw_s4
        external: true
        moduleSettings:
          sapVersion: s4
          mandt: "100"
        # Custom table settings file, relative to 'config/' directory
        # If omitted, defaults to "../src/data_modules/custom_namespace/data_foundation/data_foundation_module_type/table_settings.default.yaml"
        # tableSettings: "custom_namespace/data_foundation/data_foundation_module_type/table_settings.yaml"
  • 在配置中创建引用的 tableSettings 文件,以定义原始层数据集中的哪些表将转换为数据基础层。 建议使用默认路径:src/data_modules/custom_namespace/data_foundation/data_foundation_module_type/table_settings.default.yaml
common:
  - source:
      tableName: custom_sap_table_name
    target:
      tags: [sap, s4, hourly]
      clusterDetails:
        columns: [ pk_column]
      partitionDetails:
        column: fldate
        partitionType: time
        timeGrain: day

数据基础模块示例

以下示例注册了一个新的自定义数据基础模块,该模块可处理 sflight 表到自定义命名空间 sap_bookingdatamodel

注册新的数据基础模块。

data:
  targets:
    - id: data_foundation_sap_bookingdatamodel
      projectId:  target_project_id
      datasetId: data_foundation_sap_bookingdatamodel
  modules:
    foundation:
      - moduleId:  sap_bookingdatamodel
        type: sap_bookingdatamodel.sap
        dataSourceId: sap_raw_s4
        dataTargetId: data_foundation_sap_bookingdatamodel
        moduleSettings:
          sapVersion: s4
          mandt: "100"

创建 table_settings.default.yaml

创建表格设置文件src/data_modules/sap_bookingdatamodel/data_foundation/sap/table_settings.default.yaml,其中包含以下内容:

common:
  - source:
      tableName: sflight
    target:
      tags: [sap, s4, hourly]
      clusterDetails:
        columns: [carrid, connid]
      partitionDetails:
        column: fldate
        partitionType: time
        timeGrain: day

由于该表格在 SAP ECC 和 S/4HANA 方言中具有相同的布局,因此它是 common: 部分的子元素。

为每个表创建一个注解文件,用于使用元数据丰富架构。

在示例场景中,我们创建了 src/data_modules/sap_bookingdatamodel/data_foundation/sap/annotations/sflight.yaml,其中包含以下内容:

description: "transparent table FLIGHT, part of the Basis Components Module, BC-DWB-TND (Training and Demo)."
fields:
- name: "mandt"
  description: "Client"
- name: "carrid"
  description: "Airline Code"
- name: "connid"
  description: "Flight Connection Number"
- name: "fldate"
  description: "Flight date"
- name: "price"
  description: "Airfare"
- name: "currency"
  description: "Local currency of airline"
- name: "planetype"
  description: "Aircraft Type"
- name: "seatsmax"
  description: "Maximum capacity in economy class"
- name: "seatsocc"
  description: "Occupied seats in economy class"
- name: "paymentsum"
  description: "Total of current bookings"
- name: "seatsmax_b"
  description: "Maximum capacity in business class"
- name: "seatsocc_b"
  description: "Occupied seats in business class"
- name: "seatsmax_f"
  description: "Maximum capacity in first class"
- name: "seatsocc_f"
  description: "Occupied seats in first class"

如需验证自定义数据基础模块是否已成功编译和部署,请参阅数据产品可扩展性页面中的验证部分。