Deployment configuration

This document explains the deployment configuration options for Cortex Framework across the following areas:

This guide also provides How-to guides with step-by-step instructions for common deployment use cases and scenarios.

Configuration file: config/config.yaml

The config/config.yaml file — typically initialized from the config/config.yaml.example template — serves as the primary configuration for the Cortex Framework deployment. It defines critical parameters including the target Google Cloud execution project, source and destination BigQuery datasets, and Dataform specifications such as repository and workspace names.

The following sections provide a detailed breakdown of the config/config.yaml structure.

Build environment

The build environment project is the project that gets billed for build actions, such as BigQuery jobs (reading DD03L).

buildEnvironment:
  buildProjectId: YOUR_BUILD_PROJECT_ID

The following table describes the build environment parameters.

Parameter Meaning Default value Description
buildEnvironment.buildProjectId Build project ID YOUR_BUILD_PROJECT_ID Google Cloud Project ID where build operations are executed.

Data section overview

The data: section of the configuration file defines your data sources, targets, and the specific modules for the data foundation and data products. Its general structure is as follows:

data:
   # Geographic location for BigQuery datasets (for example: US, EU, us-central1)
   # For full list see: https://docs.cloud.google.com/cortex/docs/supported-locations
  bigQueryLocation: US
  # List of namespaces for data foundation and product modules.
  namespaces:
    - name: cortex
      path: cortex
  # List of source datasets.
  sources:
    - ...
  # List of target datasets.
  targets:
    - ...

  # Configuration for data foundation and product modules.
  modules:
    # List of foundation modules.
    foundation:
    - ... 
    # List of data product modules.
    product:
    - ...

Data: BigQuery location

Defines the location of the BigQuery source and target datasets.

Parameter Meaning Default value Description
data.bigQueryLocation BigQuery Location US BigQuery dataset location (for example, US, us-central1, or europe-west1).

Data: Cortex namespace

Defines Cortex Framework namespace.

Parameter Meaning Default value Description
data.namespaces.name Namespace name - Cortex Framework namespace name. For example, cortex.
data.namespaces.path Namespace path - Cortex Framework namespace path for subdirectories used within src and config folder. For example, cortex.

Data: BigQuery sources and target datasets

The list of sources defines BigQuery datasets where the raw data from the source system has been replicated or streamed into.

The targets define a list of BigQuery datasets where the Dataform processed datasets will be stored.

Each of source and targets are referenced from the modules using its unique ID.

# Data source and target mapping
sources:
  - id: sap_raw
    projectId: YOUR_SOURCE_PROJECT_ID
    datasetId: cortex_sap_raw

targets:
  - id: sap_foundation
    projectId: YOUR_TARGET_PROJECT_ID
    datasetId: cortex7_sap_data_foundation

The following table describes the data source and target mapping parameters.

Parameter Meaning Default value Description
data.sources.id Source ID - Defines the 'id' for the source dataset to pull data from. For example, sap_raw.
data.sources.projectId Source Project ID YOUR_SOURCE_PROJECT_ID References the Google Cloud Project ID with source data.
data.sources.datasetId Source BigQuery Dataset ID - References the BigQuery Dataset ID with source data. For example, cortex_sap_raw.
data.targets.id Target ID - Defines the 'id' for the target dataset. For example, sap_foundation.
data.targets.projectId Target Project ID YOUR_TARGET_PROJECT_ID References the Google Cloud Project ID for the target data.
data.targets.datasetId Target BigQuery Dataset ID - References the BigQuery Dataset ID for the target data. For example, cortex7_sap_data_foundation.

Data: Modules

The modules define the structure and components of the Dataform data pipelines.

Data: Modules: Foundation

This section configures the data foundation layer modules that process data from the raw layer (CDC streams) into standardized latest records representation of the source data. In case the source provides a view on the latest records directly, or such transformations are performed by the source system connector, the module can be configured as an external data foundation source.

modules:
  # List of foundation modules.
  foundation:
    # Unique identifier for the module instance.
    - moduleId: erp
      # Type of the module (namespaced, for example, cortex.sap).
      type: cortex.sap
      # Reference to the source dataset ID.
      dataSourceId: sap_raw
      # Reference to the target dataset ID.
      dataTargetId: sap_foundation
      # Module-specific configuration settings.
      moduleSettings:
        # SAP version (for example, ecc, s4).
        sapVersion: ecc
        # SAP client number.
        mandt: "100"
      # Whether the module is enabled.
      # enabled: true
      # Whether the foundation is external (does not create target dataset).
      # external: false
      # Path to the table settings configuration file.
      # tableSettings: "custom_table_settings.yaml"

The following table describes the data foundation modules parameters for modules.foundationconfiguration.

Parameter Meaning Default value Description
moduleId Module Identifier erp Unique identifier for a specific data foundation transformation module instance.
type Module Logic Type cortex.sap Defines the business logic or template applied (for example, cortex.sap).
dataSourceId Source Link sap_raw References the 'id' from the data.sources list to pull data from.
dataTargetId Target Link sap_foundation References the 'id' from the targets list to push data to.
moduleSettings.sapVersion SAP System Version ecc Applicable for SAP data sources only. Determines source-specific logic for ecc (ECC) or s4 (S/4HANA) systems.
moduleSettings.mandt SAP Client (Mandant) 100 Applicable for SAP data sources only. The 3-digit SAP client identifier used to filter data rows.
enabled Module enablement true Specifies whether the module is enabled.
external External foundation false Specifies whether the foundation is external (does not create target dataset).
tableSettings Table settings data_modules/cortex/data_foundation/sap/mytable_settings.yaml Path to custom Table settings configuration file, relative to this config file.

Data: Modules: Products

Data product modules define the aggregations, calculations, and joins necessary to transform raw data into insights that fulfill specific business use cases.

The configuration of the data products allows setting of unique ID, definition of dependencies as well as reference of the data foundation module and target dataset where the results will be stored into.

Detailed configuration of given data products is defined within files referenced by the key: tableSettings.

modules:
  # List of data product modules.
  product:
    # Unique identifier for the data product instance.
    - moduleId: sap_purchasing_organizational_structure
      # Type of the data product (namespaced).
      type: cortex.purchasing_organizational_structure
      # Map of module dependencies.
      dependsOn:
        sapModule: erp
      # Reference to the target dataset ID.
      dataTargetId: product_target
      # Whether the module is enabled.
      # enabled: true
      # Path to the table settings configuration file.
      # tableSettings: "custom_table_settings.yaml"

The following table describes the data product modules parameters for modules.product configuration.

Parameter Meaning Default value Description
moduleId Module Identifier - Unique identifier for a specific transformation module instance.
type Module Logic Type - Defines the business logic or template applied, defined in src/data_modules/{namespace}/data_product folder.
dataTargetId Target Link product_target References the 'id' from the targets list to push data to.
dependsOn Upstream Dependency sapModule: erp Specifies which foundation module must exist before the product module can be built.
enabled Module enablement true Specifies whether the module is enabled.
tableSettings Table settings src/data_modules/{namespace}/data_product/{product_name}/table_settings.default.yaml Path to custom Table settings configuration file, relative to this config file.

Deployment environment

Cortex Framework uses Dataform to orchestrate SQL transformations within BigQuery. The deployment: block defines the Dataform configuration, responsible for the execution of the data pipelines, including the repository project, location, repository name, and the Dataform workspace name.

deployment:
  targets:
    - type: dataform
      enabled: true
      targetSettings:
        repositoryProjectId: YOUR_REPO_PROJECT_ID
        repositoryRegion: us-central1
        repositoryName: cortex-repository
        workspaceName: dev
        # serviceAccount: "example@example.com"

The following table describes the deployment targets location parameters (deployment.targets:).

Parameter Meaning Default Value Description
type Deployment type dataform Type of the deployment targets.
enabled Enabled/ Disabled true Specifies if given deployment target is enabled or disabled.
targetSettings.repositoryProjectId Repository project ID YOUR_REPO_PROJECT_ID The Google Cloud Project ID where the Dataform repository is managed.
targetSettings.repositoryRegion Repository region us-central1 The Google Cloud region for the Dataform repository (for example, us-central1 or europe-west1).
targetSettings.repositoryName Repository name cortex-repository The specific name of the Dataform repository.
targetSettings.workspaceName Workspace name dev The specific Dataform workspace used for the deployment cycle.
targetSettings.serviceAccount Service account email - Default service account email for Dataform repository execution.

Configuration file: table_settings.yaml

This guide explains how to use table_settings.yaml file to configure data foundation and data product tables in Google Cloud Cortex Framework.

The data module specific table_settings.yaml file controls how raw source tables are conformed and how analytical data models are materialized within BigQuery. Using this file, you can configure tags, materialization strategies, and advanced BigQuery performance features like partitioning or clustering.

Dynamic dependency resolution

By default, Cortex Framework optimizes deployment footprint and execution time by only deploying and compiling the foundation tables that are required as dependencies of your enabled data products. If a table configured in table_settings.yaml does not have any active downstream data products depending on it, it is omitted from deployment.

To override this optimization and force the deployment of a foundation table, you can set the deployAlways attribute to true (see Data foundation style parameter reference).

In Google Cloud Cortex Framework, each module (foundation or product) can be assigned a specific table settings file in the deployment configuration file: config/config.yaml using the tableSettings property.

Configuration paths

  • Custom Settings (Recommended): To customize table behaviors, copy the default file to your configuration directory, modify it, and reference its path in config/config.yaml. The recommended paths are:
    • Foundation modules: config/namespace_path/data_foundation/foundation_module_id/table_settings.yaml (e.g., config/cortex/data_foundation/sap/table_settings.yaml)
    • Product modules: config/namespace_path/data_product/product_module_id/table_settings.yaml (e.g., config/cortex/data_product/accounting_documents/table_settings.yaml)
  • Default Fallback: If tableSettings is omitted, the framework automatically falls back to:
    • Foundation modules: definitions/data_foundation/namespace_path/table_settings.default.yaml
    • Product modules: definitions/data_product/product_module_id/table_settings.default.yaml

Configuration styles

There are two distinct schema styles for table_settings.yaml depending on the category of the module:

  1. Data Foundation Style: List-based mapping that defines the source-to-target schema relationships, CDC (Change Data Capture) handling, and BigQuery layout.
  2. Data Product Style: Map-based mapping (dictionary) that defines how analytical views or tables are materialized (e.g., as views, tables, or incremental tables) and optimized.

Both styles support three root-level sections to segregate configurations by source system version (primarily used for SAP Data Foundation and SAP-dependent products):

  • ecc: Settings applied only when deploying an SAP ECC source system.
  • s4: Settings applied only when deploying an SAP S/4HANA source system.
  • common: Settings applied regardless of the SAP version (used for conformed or universal settings).

Data foundation style

In a Data Foundation module, the table_settings.yaml file is structured as a list of table items under the ecc, s4, and common keys. Each item maps a raw source table to a conformed target table and configures its BigQuery settings.

YAML syntax example

common:
  - source:
      tableName: bkpf
      isCdc: true
    target:
      tableName: bkpf # Optional: defaults to source tableName if omitted
      tags: [sap, common, finance, hourly]
      clusterDetails:
        columns: [bukrs, gjahr]
      partitionDetails:
        column: budat
        partitionType: time
        timeGrain: day
    deployAlways: false

Parameter reference

Parameter Type Required Default / Example Description
ecc | s4 | common string No [] Source system version or dialect.
[].deployAlways boolean No false If true, the table is always deployed and built, even if optimization rules might otherwise skip it. See also Dynamic dependency resolution
Source settings

Defines the raw inbound table characteristics.

Parameter Type Required Default / Example Description
tableName string Yes bkpf The name of the raw source table in BigQuery (case-insensitive).
isCdc boolean No true Indicates if the source table contains Change Data Capture (CDC) logs.

true (default): The framework processes CDC logs (using record timestamps and operation flags) to reconstruct the latest conformed state.

false: The table is processed as a full snapshot.

Target settings

Defines the output conformed table layout in the target dataset.

Parameter Type Required Default / Example Description
tableName string No *(Same as source)* The name of the target conformed table to be created. If omitted, the framework defaults to the source tableName.
tags array[string] No [sap, finance] A list of metadata tags attached to the conformed action in Dataform. These are arbitrary strings and not needed to be pre-registered or defined in other configurations; they can be used immediately to filter pipeline executions (e.g., using dataform run --tags ...).
clusterDetails map No Optional. BigQuery clustering configuration. See Clustering details.
partitionDetails map No Optional. BigQuery partitioning configuration. See Partitioning details.

Data product style

In a Data Product module, the table_settings.yaml file is structured as a dictionary (map) under the ecc, s4, and common keys. The keys of the dictionary represent the analytical table or view names, and the values define their materialization and performance settings.

YAML syntax example

s4:
  customers:
    materializationType: incremental
    tags: [sap, dataproduct, masterdata]
    clusterDetails:
      columns: [mandt, ktokd]

Parameter reference

Parameter Type Required Default / Example Description
ecc | s4 | common map No {} A map of target analytical assets (tables or views) to their configurations.
[table_name].materializationType string No incremental How the analytical asset is built in BigQuery.

Allowed Values:

  • incremental: Processes only new or updated records since the last run. Recommended for large transactional datasets to save cost.
  • table: Completely rebuilds the table from scratch on every run.
  • view: Deploys the asset as a BigQuery SQL view (virtual table).
[table_name].tags array[string] No [sap, dataproduct] Metadata tags attached to the analytical asset in Dataform. These are arbitrary strings and not needed to be pre-registered; they can be used immediately for selective pipeline runs.
[table_name].clusterDetails map No Optional. BigQuery clustering configuration. See Clustering details.
[table_name].partitionDetails map No Optional. BigQuery partitioning configuration. See Partitioning details.

Advanced BigQuery configurations

Both styles share the same structure for optimizing BigQuery storage and query performance through Clustering and Partitioning.


Clustering details

Clustering co-locates data based on the values in specific columns. BigQuery sorts the data within each storage block using these columns, which dramatically speeds up queries that filter (WHERE) or join (JOIN) on them.

clusterDetails:
  columns: [bukrs, gjahr]
Parameter reference
Parameter Type Required Example Description
columns array[string] Yes [bukrs, gjahr] An ordered list of up to four column names to cluster the table by.

Constraint: Columns must be alphanumeric and contain underscores only. The order of columns in the list determines the sorting hierarchy.


Partitioning details

Partitioning divides a large table into smaller, physical segments based on the values of a date, timestamp, or integer column. This prevents BigQuery from scanning the entire table when a query only requests a specific range of days, months, or IDs.

partitionDetails:
  column: budat
  partitionType: time
  timeGrain: day
Parameter reference
Parameter Type Required Example Description
column string Yes budat The column name used to partition the table. Must be alphanumeric and underscores only. The column type must match the partitionType.
partitionType string Yes time The partitioning strategy.

Allowed Values:

  • time: Partitions by a time-unit (Date, Timestamp, or Datetime column).
  • DATE: Partitions explicitly by a Date column.
  • integer: Partitions by an integer range.
timeGrain string No day Required if partitionType is time or DATE. Defines the granularity of the time partitions.

Allowed Values: hour, day, month, year (case-insensitive).

rangeStart integer No 1 Required if partitionType is integer. The start value of the first partition (inclusive).
rangeEnd integer No 1000 Required if partitionType is integer. The end value of the last partition (exclusive).
rangeInterval integer No 10 Required if partitionType is integer. The width of each partition interval.

Examples

The following examples show configuration templates for both data foundation and data product modules, outlining how to customize target tables, optimize storage layout in BigQuery, and configure materialization types.

1. Custom data foundation table settings example

This example shows how to configure a foundation layer with clustered and partitioned transactional tables (like bseg and ekbe) alongside standard data tables:

# ==============================================================================
# S/4HANA-Specific Tables
# ==============================================================================
s4:
  # ACDOCA is a massive table in S/4HANA; clustering is vital
  - source:
      tableName: acdoca
    target:
      tags: [sap, s4, finance, transactional, hourly]
      clusterDetails:
        columns: [rclnt, rbukrs, gjahr]

# ==============================================================================
# ECC-Specific Tables
# ==============================================================================
ecc:
  - source:
      tableName: faglflexa
    target:
      tags: [sap, ecc, finance, transactional, hourly]

# ==============================================================================
# Common Tables (ECC & S/4HANA)
# ==============================================================================
common:
  # Financial document header (partitioned by posting date)
  - source:
      tableName: bkpf
      isCdc: true
    target:
      tags: [sap, common, finance, hourly]
      clusterDetails:
        columns: [bukrs, gjahr]
      partitionDetails:
        column: budat
        partitionType: time
        timeGrain: day

  # Purchasing document items (partitioned by creation date)
  - source:
      tableName: ekpo
    target:
      tags: [sap, common, logistics, purchasing, hourly]
      clusterDetails:
        columns: [mandt, ebeln]
      partitionDetails:
        column: aedat
        partitionType: time
        timeGrain: month

  # Standard master data table (no partitioning/clustering needed)
  - source:
      tableName: lfa1
    target:
      tags: [sap, common, masterdata, vendor, daily]

2. Custom data product table settings example

This example shows how to configure materialization types for downstream analytical data products. We set transactional sales_documents as incremental to optimize build performance and save costs, while non-transactional data tables like customers are built as standard tables:

# settings applied for both ECC and S/4HANA pipelines
common:
  # Transactional data product - incremental build
  sales_documents:
    materializationType: incremental
    tags: [sap, dataproduct, sales, transactional]
    clusterDetails:
      columns: [vkorg, vbeln]
    partitionDetails:
      column: audat
      partitionType: time
      timeGrain: day

  # Master data product - full table rebuild
  customers:
    materializationType: table
    tags: [sap, dataproduct, masterdata]
    clusterDetails:
      columns: [mandt, ktokd]

  # Aggregated reporting view - virtual view
  sales_performance_summary:
    materializationType: view
    tags: [sap, dataproduct, sales, reporting]

How-to guides

This section provides step-by-step guides for common configuration tasks and custom deployment scenarios.

Customize table scope in a data foundation module

To add or remove tables within an existing data foundation module without creating new modules or running separate pipeline instances:

  • Copy the default table_settings.default.yaml configurations into your workspace config directory (for example, config/cortex/data_foundation/sap/table_settings.yaml).
  • In your new file, add your custom tables or remove unused standard tables under the ecc, s4, or common keys as required:
common:
  - source:
      tableName: custom_table_name
    target:
      tags: [custom_tag]
  • Update config/config.yaml to reference the path of your custom table settings under the module's tableSettings property:
data:
  modules:
    foundation:
      - moduleId: erp
        type: cortex.sap
        # Link to the custom table settings file:
        tableSettings: config/cortex/data_foundation/sap/table_settings.yaml

Configure multiple instances of a data foundation module

To deploy two or more separate pipeline instances of the same module type (for example, supporting multiple SAP instances, to segment tables, isolate environments, or target different target datasets):

Before you begin: * Ensure the source tables exist in your source raw dataset. * Ensure the target dataset schemas are configured. * When working with SAP data foundation modules, verify that the metadata table DD03L contains columns and descriptor information for the custom tables you intend to ingest. See SAP ERP requirements for details.

Instructions:

  • In the config/config.yaml file, add target configurations under data.targets to define target datasets for each pipeline instance:
data:
  targets:
    - id: data_foundation_core
      projectId: target_project_id
      datasetId: data_foundation_sap_core
    - id: data_foundation_custom
      projectId: target_project_id
      datasetId: data_foundation_sap_custom
  • Define multiple instances of the module under the data.modules.foundation list. Give each instance a unique moduleId, its own target dataset IDs, and optionally tableSettings configuration:
data:
  modules:
    foundation:
      # Core SAP ERP foundation module instance
      - moduleId: erp_core
        type: cortex.sap
        dataSourceId: sap_raw
        dataTargetId: data_foundation_core
        tableSettings: "cortex-framework-core/src/data_modules/cortex/data_foundation/sap/table_settings.default.yaml"
      # Custom tables pipeline instance
      - moduleId: erp_custom
        type: cortex.sap
        dataSourceId: sap_raw
        dataTargetId: data_foundation_custom
        tableSettings: "config/cortex/data_foundation/sap/table_settings_custom.yaml"
  • Create the config/cortex/data_foundation/sap/table_settings_custom.yaml file specifying the custom scope. E.g.:
common:
  - source:
      tableName: custom_sap_table_name
    target:
      tags: [sap, s4, hourly]
      clusterDetails:
        columns: [carrid, connid]
      partitionDetails:
        column: fldate
        partitionType: time
        timeGrain: day
  • Apply the changes by running the deployment script (uv run cortex-build-and-deploy), then execute the Dataform actions as described in Post-deployment steps.