建立資料產品模組
您必須建立自訂資料產品模組,才能定義自己的商業邏輯和分析模型,對基礎資料表執行計算,並將結果封裝成可部署的資料集。
必要條件
建立自訂資料產品模組時,強烈建議使用專屬的自訂命名空間進行封裝。此外,請確認您打算使用的來源資料表位於資料基礎資料集中。
建立資料產品模組
定義資料產品模組的必要步驟如下:
- 在
config/config.yaml檔案中註冊資料產品模組,方法是使用項目擴充data.modules.products清單:
[...]
data:
[...]
# Configuration for data foundation and product modules.
modules:
# List of foundation modules.
foundation:
[...]
# List of data product modules.
product:
[...]
- moduleId: product_module_id
type: custom_namespace.flight_usd
dependsOn:
sapModule: erp
sapModuleCustNS: foundation_module_id
dataTargetId: product_target
enabled: true
tableSettings: "table_settings.yaml"
# Optional, references file in `config/custom_namespace_path/data_product/product_module_id/`
# If omitted, defaults to src/data_modules/custom_namespace_path/data_product/table_settings.default.yaml.
[...]
- 建立
tableSettings檔案 (例如config/custom_namespace_path/data_product/product_module_id/table_settings.yaml)。這個 YAML 檔案會控管資料表設定,例如具體化和 BigQuery 最佳化詳細資料:
common:
custom_sales_summary:
materialization_type: "table"
tags: ["custom", "sales", "reporting"]
partition_details:
column: "created_date"
partition_type: "date"
time_grain: "day"
cluster_details:
columns:
- "customer_id"
- 建立註解檔案
系統會為每個資料產品輸出構件 (資料表、檢視區塊) 建立註解檔案 <tablename>.yaml,並以 YAML 格式說明資料欄和欄位。編譯期間,建構工具會自動在產品的 annotations/ 資料夾 (例如 annotations/custom_sales_summary.yaml) 中搜尋註解,並直接將這些字串合併至輸出 Dataform 結構定義,以便保留在 BigQuery 資料表的中繼資料中。
註解 config/custom_namespace_path/data_product/product_module_id/annotations/'tablename'.yaml 檔案的格式如下:
description: "Description of the table or view purpose"
fields:
- name: "customer_id" # column name
description: "Customer identifier" # column description
- name: "column2"
description: "Description of Column 2"
- name: "column3"
description: "Description of Column 3"
- 在資料產品資料夾
config/custom_namespace_path/data_product/product_module_id/中建立manifest.yaml檔案,並維護類型、資料表和模組依附元件。資訊清單檔案的格式如下:
type: sales_performance
builder: sap_product # Automatically resolves to the global SapProductBuilder fallback
dependencies:
sapModule:
type: sap
supportedVersions:
- ecc
- s4
資料產品模組範例
以我們建立的航班範例 src/data_modules/custom_namespace_path/data_product/product_module_id/manifest.yaml 為例,如果內容
type: product_module_id
dependencies:
sapModule:
type: cortex.sap
supported_versions:
- ecc
- s4
tables:
common:
- tcurr
sapModuleCustNS:
type: custom_namespace .sap
supported_versions:
- ecc
- s4
tables:
common:
- sflight
builder: sap_product
- 在下一個步驟中,請擴充資料產品資料表的參照資料表設定檔。
在使用的範例中,建立 config/custom_namespace_path/data_product/product_module_id/table_settings.yaml,並加入以下內容:
ecc:
flights_usd:
materializationType: incremental
tags: [sap, dataproduct, masterdata]
s4:
flights_usd:
materializationType: incremental
tags: [sap, dataproduct, masterdata]
- 為資料產品資料表建立註解,以說明豐富儲存空間結構定義。
在使用的範例中,建立檔案:src/data_modules/custom_namespace_path/data_product/product_module_id/annotations/flights_usd.yaml,內容如下:
description: "Flight scheduling and pricing information, including currency conversion to USD."
fields:
- name: "client_mandt"
description: "Client (Mandant), PK"
- name: "airline_code_carrid"
description: "Airline Carrier ID, PK"
- name: "flight_connection_number_connid"
description: "Flight Number, PK"
- name: "flight_date_fldate"
description: "Flight Date"
- name: "price_usd"
description: "Price in USD"
- name: "price"
description: "Price in local currency"
- name: "currency"
description: "Local currency"
- 資料產品的商業邏輯會儲存在
js或sqlx檔案中。
在範例中,建立含有以下內容的 src/data_modules/custom_namespace_path/data_product/product_module_id/definitions/flights_usd.js 檔案:
// ___MODULE_CONTEXT___
// ___TABLE_CONFIG___
const moduleConfig = config.product[moduleContext.moduleId];
const sapModuleConfigDatasetId = moduleConfig.sources.sapModule.datasetId;
const sapModuleCustNSConfigDatasetId = moduleConfig.sources.sapModuleCustNS.datasetId;
const materializationType = tableConfig.materializationType || "incremental";
const incremental = require("includes/cortex/incremental.js");
const publish_config = require("includes/cortex/publish_config.js");
const publishConfig = publish_config.getPublishConfig(
materializationType,
tableConfig,
moduleConfig,
[
"client_mandt",
"airline_code_carrid",
"flight_connection_number_connid",
"flight_date_fldate"
]
);
publish("flight_usd", publishConfig).query(
(ctx) => `
WITH flight_base AS (
SELECT
mandt,
carrid,
connid,
fldate,
price,
currency,
-- Convert flight date string (YYYYMMDD) to an integer to calculate SAP's inverted date key
CAST(99999999 - CAST(fldate AS INT64) AS STRING) AS inverted_fldate
FROM ${ctx.ref(sapModuleCustNSConfigDatasetId, 'sflight')} AS flight
),
ranked_exchange_rates AS (
SELECT
f.mandt,
f.carrid,
f.connid,
f.fldate,
f.price,
f.currency,
t.ukurs,
-- Window function to grab the closest historical exchange rate
ROW_NUMBER() OVER (
PARTITION BY f.mandt, f.carrid, f.connid, f.fldate
ORDER BY t.gdatu ASC
) AS latest_rate_rank
FROM flight_base f
LEFT JOIN ${ctx.ref(sapModuleConfigDatasetId, 'tcurr')} AS t
ON f.mandt = t.mandt
AND t.kurst = 'M' -- 'M' is the standard SAP default for average exchange rates
AND t.fcurr = f.currency
AND t.tcurr = 'USD'
-- Chronological (rate_date <= flight_date) translates to (t.gdatu >= inverted_fldate)
AND t.gdatu >= f.inverted_fldate
)
SELECT
client_mandt,
airline_code_carrid,
flight_connection_number_connid,
flight_date_fldate,
price,
currency,
price_usd,
CURRENT_TIMESTAMP() AS bq_loaded_at
FROM (
SELECT
mandt AS client_mandt,
carrid AS airline_code_carrid,
connid AS flight_connection_number_connid,
PARSE_TIMESTAMP('%Y%m%d', fldate) AS flight_date_fldate,
price AS price,
currency AS currency,
-- Currency Conversion Logic
CASE
WHEN currency = 'USD' THEN price
WHEN ukurs IS NULL THEN NULL -- Handles cases where no exchange rate is found
-- If UKURS is negative, it's an indirect quotation (1 USD = X Local) -> Divide
WHEN ukurs < 0 THEN ROUND(price / ABS(ukurs), 2)
-- If UKURS is positive, it's a direct quotation (1 Local = X USD) -> Multiply
ELSE ROUND(price * ukurs, 2)
END AS price_usd
FROM ranked_exchange_rates
WHERE latest_rate_rank = 1
)
${incremental.getWhere(ctx, ["flight_date_fldate"])}
`
);
驗證自訂命名空間擴充功能
如要驗證命名空間、資料基礎或資料產品模組是否已成功擴充 Google Cloud Cortex Framework,請按照下列步驟操作:
如要部署資料產品模組,請執行
uv run targets build、deploy或build-and-deploy,詳情請參閱部署頁面。在 BigQuery 控制台中開啟 Dataform UI,然後前往存放區和工作區。
在 Dataform 使用者介面中,確認控制台未顯示任何編譯錯誤。
確認準備好的擴充功能已部署至
definitions/data_foundation/custom_namespace_path/和definitions/data_product/product_module_id/路徑。按照操作說明執行 Dataform 管道。
在 BigQuery 中確認產品資料集包含資料產品資料表,且已填入資料。
- 上一個步驟:建立資料基礎模組
- 返回「總覽」