建立資料基礎模組

您必須建立資料基礎模組,才能處理補充原始資料表,並將其納入資料基礎資料集。

建立自訂資料基礎模組時,建議使用專屬的自訂命名空間進行封裝。此外,請確保您打算處理的來源資料表位於原始層資料集中。

處理 SAP 資料基礎模組時,請確保 DD03L 資料表已在原始資料集中複製,並在 config/config.yaml 檔案中設定為基礎模組的來源。此外,請確保複製的 DD03L 資料表包含您打算擷取的所有資料表 (例如自訂或補充資料表,如 sflight) 的欄位中繼資料記錄。Cortex Framework 建構指令碼和依附元件解析器會讀取這些中繼資料列,以識別資料欄清單、資料型別,以及資料表之間的主鍵關係。

資料基礎模組定義

如要定義資料基礎模組,請按照下列步驟操作:

  • config/config.yaml 檔案中,將目標資料集設定新增至 data.targets。這會準備專屬的 BigQuery 資料集,以便部署基礎資料表:
[...]
data:
  [...]
  targets:
    - id: data_foundation_sap_custom_namespace
      # Google Cloud Project ID for the target dataset.
      projectId:  target_project_id 
      # BigQuery dataset ID for the target.
      datasetId: data_foundation_sap_custom_namespace   
  • 如要定義資料基礎模組,請在 config/config.yaml 檔案的 modules.foundation 區段中新增下列內容:
[...]
data:
  [...]
  modules:
    foundation:
      [...]
      - moduleId:  foundation_module_id 
        type: cortex.sap
        dataSourceId: sap_raw_s4
        dataTargetId: data_foundation_sap_custom_namespace
        moduleSettings:
          sapVersion: s4
          mandt: "100"
        tableSettings: "table_settings.yaml"
        # Optional. Path to custom table settings configuration relative to this config file, e.g., `config/custom_namespace_path/data_foundation/sap/table_settings.yaml`
        # If omitted, defaults to src/data_modules/cortex/data_foundation/sap/table_settings.default.yaml. 

  • 替代做法:如果使用外部處理的 CDC 資料表做為資料基礎,請在 config/config.yaml 檔案中調整 modules.foundation 區段,並移除 targetDataSetIDexternal: true
[...]
data:
  [...]
  modules:
    foundation:
      [...]
      - moduleId:  foundation_module_id 
        type: cortex.sap
        dataSourceId: sap_raw_s4
        external: true
        moduleSettings:
          sapVersion: s4
          mandt: "100"
        tableSettings: "table_settings.yaml"
        # Optional. Path to custom table settings configuration relative to this config file, e.g., `config/custom_namespace_path/data_foundation/sap/table_settings.yaml`
        # If omitted, defaults to src/data_modules/cortex/data_foundation/sap/table_settings.default.yaml. 

  • 建立參照,table_settings.yaml定義要將原始層資料集中的哪些資料表轉換為資料基礎層。
common:
  - source:
      tableName: custom_sap_table_name
    target:
      tags: [sap, s4, hourly]
      clusterDetails:
        columns: [carrid, connid]
      partitionDetails:
        column: fldate
        partitionType: time
        timeGrain: day

資料基礎模組範例

在下列範例中,我們將使用先前定義的命名空間 custom_namespace,建立新的自訂資料基礎模組來處理 sflight 資料表。

註冊新的資料基礎模組。

data:
  targets:
    - id: data_foundation_sap_bookingdatamodel
      projectId:  target_project_id
      datasetId: data_foundation_sap_bookingdatamodel   
  modules:
    foundation:
      - moduleId:  sap_bookingdatamodel
        type: sap_bookingdatamodel.sap
        dataSourceId: sap_raw_s4
        dataTargetId: data_foundation_sap_sap_bookingdatamodel
        moduleSettings:
          sapVersion: s4
          mandt: "100"
        tableSettings: "table_settings.yaml"

建立 table_settings.yaml

建立 config/sap_bookingdatamodel/data_foundation/sap/table_settings.yaml 檔案,並加入以下內容:

common:
  - source:
      tableName: sflight
    target:
      tags: [sap, s4, hourly]
      clusterDetails:
        columns: [carrid, connid]
      partitionDetails:
        column: fldate
        partitionType: time
        timeGrain: day

由於資料表在 SAP ECC 和 S/4HANA 方言中具有相同的版面配置,因此是 common: 區段的子項元素。

為每個資料表建立註解檔案,用於以中繼資料擴充結構定義。

在範例情境中,我們建立 src/data_modules/sap_bookingdatamodel/data_foundation/sap/annotations/sflight.yaml,並加入以下內容:

description: "transparent table FLIGHT, part of the Basis Components Module, BC-DWB-TND (Training and Demo)."
fields:
- name: "mandt"
  description: "Client"
- name: "carrid"
  description: "Airline Code"
- name: "connid"
  description: "Flight Connection Number"
- name: "fldate"
  description: "Flight date"
- name: "price"
  description: "Airfare"
- name: "currency"
  description: "Local currency of airline"
- name: "planetype"
  description: "Aircraft Type"
- name: "seatsmax"
  description: "Maximum capacity in economy class"
- name: "seatsocc"
  description: "Occupied seats in economy class"
- name: "paymentsum"
  description: "Total of current bookings"
- name: "seatsmax_b"
  description: "Maximum capacity in business class"
- name: "seatsocc_b"
  description: "Occupied seats in business class"
- name: "seatsmax_f"
  description: "Maximum capacity in first class"
- name: "seatsocc_f"
  description: "Occupied seats in first class"

如要確認自訂資料基礎模組是否編譯及部署成功,請參閱資料產品擴充性頁面的「驗證」部分。