Data product module creation
The creation of a custom data product module is required to define your own business logic and analytical models, letting you run calculations over your foundation tables and package them into deployable datasets.
Prerequisites
When creating a custom data product module, we highly recommend using a dedicated custom namespace to package it. Additionally, ensure the source table you plan to use exists in the data foundation dataset.
Creation of a data product module
Data product module definition requires these steps:
- Registration of the data product module within the
config/config.yamlfile, by extending thedata.modules.productslist with entry:
[...]
data:
[...]
# Configuration for data foundation and product modules.
modules:
# List of foundation modules.
foundation:
[...]
# List of data product modules.
product:
[...]
- moduleId: product_module_id
type: custom_namespace.flight_usd
dependsOn:
sapModule: erp
sapModuleCustNS: foundation_module_id
dataTargetId: product_target
enabled: true
tableSettings: "table_settings.yaml"
# Optional, references file in `config/custom_namespace_path/data_product/product_module_id/`
# If omitted, defaults to src/data_modules/custom_namespace_path/data_product/table_settings.default.yaml.
[...]
- Creation of
tableSettingsfile (e.g.config/custom_namespace_path/data_product/product_module_id/table_settings.yaml). This YAML controls table configurations like materializations and BigQuery optimization details:
common:
custom_sales_summary:
materialization_type: "table"
tags: ["custom", "sales", "reporting"]
partition_details:
column: "created_date"
partition_type: "date"
time_grain: "day"
cluster_details:
columns:
- "customer_id"
- Creation of annotation file
The annotation file <tablename>.yaml is created for each data product output artifacts (table, view) and describes columns and fields in YAML format. During compilation, the builder automatically search for annotations within the product's annotations/ folder (e.g. annotations/custom_sales_summary.yaml), merges these strings directly into the output Dataform schema definitions so they are preserved in BigQuery table metadata.
An annotation config/custom_namespace_path/data_product/product_module_id/annotations/'tablename'.yaml file has the format:
description: "Description of the table or view purpose"
fields:
- name: "customer_id" # column name
description: "Customer identifier" # column description
- name: "column2"
description: "Description of Column 2"
- name: "column3"
description: "Description of Column 3"
- Create a
manifest.yamlfile in your data product folderconfig/custom_namespace_path/data_product/product_module_id/, maintaining the type, tables and module dependencies. The manifest file follows this format:
type: sales_performance
builder: sap_product # Automatically resolves to the global SapProductBuilder fallback
dependencies:
sapModule:
type: sap
supportedVersions:
- ecc
- s4
Data product module example
For the flights example we are creating src/data_modules/custom_namespace_path/data_product/product_module_id/manifest.yaml with the content
type: product_module_id
dependencies:
sapModule:
type: cortex.sap
supported_versions:
- ecc
- s4
tables:
common:
- tcurr
sapModuleCustNS:
type: custom_namespace .sap
supported_versions:
- ecc
- s4
tables:
common:
- sflight
builder: sap_product
- In the next step, extend the referenced table settings file for data product tables.
In the used example, create: config/custom_namespace_path/data_product/product_module_id/table_settings.yaml with the content:
ecc:
flights_usd:
materializationType: incremental
tags: [sap, dataproduct, masterdata]
s4:
flights_usd:
materializationType: incremental
tags: [sap, dataproduct, masterdata]
- Create annotations for data product tables to enrich the storage schema with descriptions.
In the used example, create file: src/data_modules/custom_namespace_path/data_product/product_module_id/annotations/flights_usd.yaml with the content:
description: "Flight scheduling and pricing information, including currency conversion to USD."
fields:
- name: "client_mandt"
description: "Client (Mandant), PK"
- name: "airline_code_carrid"
description: "Airline Carrier ID, PK"
- name: "flight_connection_number_connid"
description: "Flight Number, PK"
- name: "flight_date_fldate"
description: "Flight Date"
- name: "price_usd"
description: "Price in USD"
- name: "price"
description: "Price in local currency"
- name: "currency"
description: "Local currency"
- The business logic of the data product is stored in
jsorsqlxfiles.
In given example create src/data_modules/custom_namespace_path/data_product/product_module_id/definitions/flights_usd.js file with the content:
// ___MODULE_CONTEXT___
// ___TABLE_CONFIG___
const moduleConfig = config.product[moduleContext.moduleId];
const sapModuleConfigDatasetId = moduleConfig.sources.sapModule.datasetId;
const sapModuleCustNSConfigDatasetId = moduleConfig.sources.sapModuleCustNS.datasetId;
const materializationType = tableConfig.materializationType || "incremental";
const incremental = require("includes/cortex/incremental.js");
const publish_config = require("includes/cortex/publish_config.js");
const publishConfig = publish_config.getPublishConfig(
materializationType,
tableConfig,
moduleConfig,
[
"client_mandt",
"airline_code_carrid",
"flight_connection_number_connid",
"flight_date_fldate"
]
);
publish("flight_usd", publishConfig).query(
(ctx) => `
WITH flight_base AS (
SELECT
mandt,
carrid,
connid,
fldate,
price,
currency,
-- Convert flight date string (YYYYMMDD) to an integer to calculate SAP's inverted date key
CAST(99999999 - CAST(fldate AS INT64) AS STRING) AS inverted_fldate
FROM ${ctx.ref(sapModuleCustNSConfigDatasetId, 'sflight')} AS flight
),
ranked_exchange_rates AS (
SELECT
f.mandt,
f.carrid,
f.connid,
f.fldate,
f.price,
f.currency,
t.ukurs,
-- Window function to grab the closest historical exchange rate
ROW_NUMBER() OVER (
PARTITION BY f.mandt, f.carrid, f.connid, f.fldate
ORDER BY t.gdatu ASC
) AS latest_rate_rank
FROM flight_base f
LEFT JOIN ${ctx.ref(sapModuleConfigDatasetId, 'tcurr')} AS t
ON f.mandt = t.mandt
AND t.kurst = 'M' -- 'M' is the standard SAP default for average exchange rates
AND t.fcurr = f.currency
AND t.tcurr = 'USD'
-- Chronological (rate_date <= flight_date) translates to (t.gdatu >= inverted_fldate)
AND t.gdatu >= f.inverted_fldate
)
SELECT
client_mandt,
airline_code_carrid,
flight_connection_number_connid,
flight_date_fldate,
price,
currency,
price_usd,
CURRENT_TIMESTAMP() AS bq_loaded_at
FROM (
SELECT
mandt AS client_mandt,
carrid AS airline_code_carrid,
connid AS flight_connection_number_connid,
PARSE_TIMESTAMP('%Y%m%d', fldate) AS flight_date_fldate,
price AS price,
currency AS currency,
-- Currency Conversion Logic
CASE
WHEN currency = 'USD' THEN price
WHEN ukurs IS NULL THEN NULL -- Handles cases where no exchange rate is found
-- If UKURS is negative, it's an indirect quotation (1 USD = X Local) -> Divide
WHEN ukurs < 0 THEN ROUND(price / ABS(ukurs), 2)
-- If UKURS is positive, it's a direct quotation (1 Local = X USD) -> Multiply
ELSE ROUND(price * ukurs, 2)
END AS price_usd
FROM ranked_exchange_rates
WHERE latest_rate_rank = 1
)
${incremental.getWhere(ctx, ["flight_date_fldate"])}
`
);
Verification of custom namespace extension
To verify the successful extension of Google Cloud Cortex Framework with namespaces, data foundation or data product modules, follow these steps:
To deploy the data product module, run
uv run targets build,deployorbuild-and-deploy, as described in the Deployment page.Open the Dataform UI in the BigQuery Console, and navigate to the repository and workspace.
In the Dataform UI ensure no compilation errors are displayed in the console.
Verify that the prepared extensions have been deployed to the paths
definitions/data_foundation/custom_namespace_path/anddefinitions/data_product/product_module_id/.Follow the instructions for execution of Dataform pipelines.
Verify in BigQuery that the product dataset contains the data product table, and that it is filled with data.
- Previous step: Data foundation module creation
- Back to Overview