Deployment configuration

This document explains the deployment configuration options for Cortex Framework across the following areas:

Deployment configurations (config/config.yaml): Defines global variables, build environments, and module mapping (data foundation and data product targets).
Table configurations (table_settings.yaml): Module-specific performance and schema specifications, outlining how base tables are compiled and conformed in BigQuery.

This guide also provides How-to guides with step-by-step instructions for common deployment use cases and scenarios.

Configuration file: `config/config.yaml`

The config/config.yaml file — typically initialized from the config/config.yaml.example template — serves as the primary configuration for the Cortex Framework deployment. It defines critical parameters including the target Google Cloud execution project, source and destination BigQuery datasets, and Dataform specifications such as repository and workspace names.

The following sections provide a detailed breakdown of the config/config.yaml structure.

Build environment

The build environment project is the project that gets billed for build actions, such as BigQuery jobs (reading DD03L).

buildEnvironment:
  buildProjectId: YOUR_BUILD_PROJECT_ID

The following table describes the build environment parameters.

Parameter	Meaning	Default value	Description
`buildEnvironment.buildProjectId`	Build project ID	`YOUR_BUILD_PROJECT_ID`	Google Cloud Project ID where build operations are executed.

Data section overview

The data: section of the configuration file defines your data sources, targets, and the specific modules for the data foundation and data products. Its general structure is as follows:

data:
   # Geographic location for BigQuery datasets (for example: US, EU, us-central1)
   # For full list see: https://docs.cloud.google.com/cortex/docs/supported-locations
  bigQueryLocation: US
  # List of namespaces for data foundation and product modules.
  namespaces:
    - name: cortex
      path: cortex
  # List of source datasets.
  sources:
    - ...
  # List of target datasets.
  targets:
    - ...

  # Configuration for data foundation and product modules.
  modules:
    # List of foundation modules.
    foundation:
    - ... 
    # List of data product modules.
    product:
    - ...

Data: BigQuery location

Defines the location of the BigQuery source and target datasets.

Parameter	Meaning	Default value	Description
`data.bigQueryLocation`	BigQuery Location	`US`	BigQuery dataset location (for example, `US`, `us-central1`, or `europe-west1`).

Data: Cortex namespace

Defines Cortex Framework namespace.

Parameter	Meaning	Default value	Description
`data.namespaces.name`	Namespace name	-	Cortex Framework namespace name. For example, `cortex`.
`data.namespaces.path`	Namespace path	-	Cortex Framework namespace path for subdirectories used within src and config folder. For example, `cortex`.

Data: BigQuery sources and target datasets

The list of sources defines BigQuery datasets where the raw data from the source system has been replicated or streamed into.

The targets define a list of BigQuery datasets where the Dataform processed datasets will be stored.

Each of source and targets are referenced from the modules using its unique ID.

# Data source and target mapping
sources:
  - id: sap_raw
    projectId: YOUR_SOURCE_PROJECT_ID
    datasetId: cortex_sap_raw

targets:
  - id: sap_foundation
    projectId: YOUR_TARGET_PROJECT_ID
    datasetId: cortex7_sap_data_foundation

The following table describes the data source and target mapping parameters.

Parameter	Meaning	Default value	Description
`data.sources.id`	Source ID	`-`	Defines the 'id' for the source dataset to pull data from. For example, `sap_raw`.
`data.sources.projectId`	Source Project ID	`YOUR_SOURCE_PROJECT_ID`	References the Google Cloud Project ID with source data.
`data.sources.datasetId`	Source BigQuery Dataset ID	`-`	References the BigQuery Dataset ID with source data. For example, `cortex_sap_raw`.
`data.targets.id`	Target ID	-	Defines the 'id' for the target dataset. For example, `sap_foundation`.
`data.targets.projectId`	Target Project ID	`YOUR_TARGET_PROJECT_ID`	References the Google Cloud Project ID for the target data.
`data.targets.datasetId`	Target BigQuery Dataset ID	`-`	References the BigQuery Dataset ID for the target data. For example, `cortex7_sap_data_foundation`.

Data: Modules

The modules define the structure and components of the Dataform data pipelines.

Data: Modules: Foundation

This section configures the data foundation layer modules that process data from the raw layer (CDC streams) into standardized latest records representation of the source data. In case the source provides a view on the latest records directly, or such transformations are performed by the source system connector, the module can be configured as an external data foundation source.

modules:
  # List of foundation modules.
  foundation:
    # Unique identifier for the module instance.
    - moduleId: erp
      # Type of the module (namespaced, for example, cortex.sap).
      type: cortex.sap
      # Reference to the source dataset ID.
      dataSourceId: sap_raw
      # Reference to the target dataset ID.
      dataTargetId: sap_foundation
      # Module-specific configuration settings.
      moduleSettings:
        # SAP version (for example, ecc, s4).
        sapVersion: ecc
        # SAP client number.
        mandt: "100"
      # Whether the module is enabled.
      # enabled: true
      # Whether the foundation is external (does not create target dataset).
      # external: false
      # Path to the table settings configuration file.
      # tableSettings: "custom_table_settings.yaml"

The following table describes the data foundation modules parameters for modules.foundationconfiguration.

Parameter	Meaning	Default value	Description
`moduleId`	Module Identifier	`erp`	Unique identifier for a specific data foundation transformation module instance.
`type`	Module Logic Type	`cortex.sap`	Defines the business logic or template applied (for example, `cortex.sap`).
`dataSourceId`	Source Link	`sap_raw`	References the 'id' from the data.sources list to pull data from.
`dataTargetId`	Target Link	`sap_foundation`	References the 'id' from the targets list to push data to.
`moduleSettings.sapVersion`	SAP System Version	`ecc`	Applicable for SAP data sources only. Determines source-specific logic for `ecc` (ECC) or `s4` (S/4HANA) systems.
`moduleSettings.mandt`	SAP Client (Mandant)	`100`	Applicable for SAP data sources only. The 3-digit SAP client identifier used to filter data rows.
`enabled`	Module enablement	`true`	Specifies whether the module is enabled.
`external`	External foundation	`false`	Specifies whether the foundation is external (does not create target dataset).
`tableSettings`	Table settings	`data_modules/cortex/data_foundation/sap/mytable_settings.yaml`	Path to custom Table settings configuration file, relative to this config file.

Data: Modules: Products

Data product modules define the aggregations, calculations, and joins necessary to transform raw data into insights that fulfill specific business use cases.

The configuration of the data products allows setting of unique ID, definition of dependencies as well as reference of the data foundation module and target dataset where the results will be stored into.

Detailed configuration of given data products is defined within files referenced by the key: tableSettings.

modules:
  # List of data product modules.
  product:
    # Unique identifier for the data product instance.
    - moduleId: sap_purchasing_organizational_structure
      # Type of the data product (namespaced).
      type: cortex.purchasing_organizational_structure
      # Map of module dependencies.
      dependsOn:
        sapModule: erp
      # Reference to the target dataset ID.
      dataTargetId: product_target
      # Whether the module is enabled.
      # enabled: true
      # Path to the table settings configuration file.
      # tableSettings: "custom_table_settings.yaml"

The following table describes the data product modules parameters for modules.product configuration.

Parameter	Meaning	Default value	Description
`moduleId`	Module Identifier	-	Unique identifier for a specific transformation module instance.
`type`	Module Logic Type	-	Defines the business logic or template applied, defined in `src/data_modules/{namespace}/data_product` folder.
`dataTargetId`	Target Link	`product_target`	References the 'id' from the targets list to push data to.
`dependsOn`	Upstream Dependency	`sapModule: erp`	Specifies which foundation module must exist before the product module can be built.
`enabled`	Module enablement	`true`	Specifies whether the module is enabled.
`tableSettings`	Table settings	`src/data_modules/{namespace}/data_product/{product_name}/table_settings.default.yaml`	Path to custom Table settings configuration file, relative to this config file.

Deployment environment

Cortex Framework uses Dataform to orchestrate SQL transformations within BigQuery. The deployment: block defines the Dataform configuration, responsible for the execution of the data pipelines, including the repository project, location, repository name, and the Dataform workspace name.

deployment:
  targets:
    - type: dataform
      enabled: true
      targetSettings:
        repositoryProjectId: YOUR_REPO_PROJECT_ID
        repositoryRegion: us-central1
        repositoryName: cortex-repository
        workspaceName: dev
        # serviceAccount: "example@example.com"

The following table describes the deployment targets location parameters (deployment.targets:).

Parameter	Meaning	Default Value	Description
`type`	Deployment type	`dataform`	Type of the deployment targets.
`enabled`	Enabled/ Disabled	`true`	Specifies if given deployment target is enabled or disabled.
`targetSettings.repositoryProjectId`	Repository project ID	`YOUR_REPO_PROJECT_ID`	The Google Cloud Project ID where the Dataform repository is managed.
`targetSettings.repositoryRegion`	Repository region	`us-central1`	The Google Cloud region for the Dataform repository (for example, `us-central1` or `europe-west1`).
`targetSettings.repositoryName`	Repository name	`cortex-repository`	The specific name of the Dataform repository.
`targetSettings.workspaceName`	Workspace name	`dev`	The specific Dataform workspace used for the deployment cycle.
`targetSettings.serviceAccount`	Service account email	`-`	Default service account email for Dataform repository execution.

Configuration file: `table_settings.yaml`

This guide explains how to use table_settings.yaml file to configure data foundation and data product tables in Google Cloud Cortex Framework.

The data module specific table_settings.yaml file controls how raw source tables are conformed and how analytical data models are materialized within BigQuery. Using this file, you can configure tags, materialization strategies, and advanced BigQuery performance features like partitioning or clustering.

Dynamic dependency resolution

By default, Cortex Framework optimizes deployment footprint and execution time by only deploying and compiling the foundation tables that are required as dependencies of your enabled data products. If a table configured in table_settings.yaml does not have any active downstream data products depending on it, it is omitted from deployment.

To override this optimization and force the deployment of a foundation table, you can set the deployAlways attribute to true (see Data foundation style parameter reference).

In Google Cloud Cortex Framework, each module (foundation or product) can be assigned a specific table settings file in the deployment configuration file: config/config.yaml using the tableSettings property.

Configuration paths

Custom Settings (Recommended): To customize table behaviors, copy the default file to your configuration directory, modify it, and reference its path in config/config.yaml. The recommended paths are:
- Foundation modules: config/namespace_path/data_foundation/foundation_module_id/table_settings.yaml (e.g., config/cortex/data_foundation/sap/table_settings.yaml)
- Product modules: config/namespace_path/data_product/product_module_id/table_settings.yaml (e.g., config/cortex/data_product/accounting_documents/table_settings.yaml)
Default Fallback: If tableSettings is omitted, the framework automatically falls back to:
- Foundation modules: definitions/data_foundation/namespace_path/table_settings.default.yaml
- Product modules: definitions/data_product/product_module_id/table_settings.default.yaml

Configuration styles

There are two distinct schema styles for table_settings.yaml depending on the category of the module:

Data Foundation Style: List-based mapping that defines the source-to-target schema relationships, CDC (Change Data Capture) handling, and BigQuery layout.
Data Product Style: Map-based mapping (dictionary) that defines how analytical views or tables are materialized (e.g., as views, tables, or incremental tables) and optimized.

Both styles support three root-level sections to segregate configurations by source system version (primarily used for SAP Data Foundation and SAP-dependent products):

ecc: Settings applied only when deploying an SAP ECC source system.
s4: Settings applied only when deploying an SAP S/4HANA source system.
common: Settings applied regardless of the SAP version (used for conformed or universal settings).

Data foundation style

In a Data Foundation module, the table_settings.yaml file is structured as a list of table items under the ecc, s4, and common keys. Each item maps a raw source table to a conformed target table and configures its BigQuery settings.

YAML syntax example

common:
  - source:
      tableName: bkpf
      isCdc: true
    target:
      tableName: bkpf # Optional: defaults to source tableName if omitted
      tags: [sap, common, finance, hourly]
      clusterDetails:
        columns: [bukrs, gjahr]
      partitionDetails:
        column: budat
        partitionType: time
        timeGrain: day
    deployAlways: false

Parameter reference

Parameter	Type	Required	Default / Example	Description
`ecc` \| `s4` \| `common`	`string`	No	`[]`	Source system version or dialect.
`[].deployAlways`	`boolean`	No	`false`	If `true`, the table is always deployed and built, even if optimization rules might otherwise skip it. See also Dynamic dependency resolution

Source settings

Defines the raw inbound table characteristics.

Parameter Type Required Default / Example Description

tableName string Yes bkpf The name of the raw source table in BigQuery (case-insensitive).

Parameter	Type	Required	Default / Example	Description
`tableName`	`string`	Yes	`bkpf`	The name of the raw source table in BigQuery (case-insensitive).
`isCdc`	`boolean`	No	`true`	Indicates if the source table contains Change Data Capture (CDC) logs. • `true` (default): The framework processes CDC logs (using record timestamps and operation flags) to reconstruct the latest conformed state. • `false`: The table is processed as a full snapshot.

isCdc

boolean

true

Indicates if the source table contains Change Data Capture (CDC) logs.

• true (default): The framework processes CDC logs (using record timestamps and operation flags) to reconstruct the latest conformed state.

• false: The table is processed as a full snapshot.

Target settings

Defines the output conformed table layout in the target dataset.

Parameter	Type	Required	Default / Example	Description
`tableName`	`string`	No	(Same as source)	The name of the target conformed table to be created. If omitted, the framework defaults to the source `tableName`.
`tags`	`array[string]`	No	`[sap, finance]`	A list of metadata tags attached to the conformed action in Dataform. These are arbitrary strings and not needed to be pre-registered or defined in other configurations; they can be used immediately to filter pipeline executions (e.g., using `dataform run --tags ...`).
`clusterDetails`	`map`	No	—	Optional. BigQuery clustering configuration. See Clustering details.
`partitionDetails`	`map`	No	—	Optional. BigQuery partitioning configuration. See Partitioning details.

Data product style

In a Data Product module, the table_settings.yaml file is structured as a dictionary (map) under the ecc, s4, and common keys. The keys of the dictionary represent the analytical table or view names, and the values define their materialization and performance settings.

YAML syntax example

s4:
  customers:
    materializationType: incremental
    tags: [sap, dataproduct, masterdata]
    clusterDetails:
      columns: [mandt, ktokd]

Parameter reference

Parameter	Type	Required	Default / Example	Description
`ecc` \| `s4` \| `common`	`map`	No	`{}`	A map of target analytical assets (tables or views) to their configurations.
`[table_name].materializationType`	`string`	No	`incremental`	How the analytical asset is built in BigQuery. Allowed Values: `incremental`: Processes only new or updated records since the last run. Recommended for large transactional datasets to save cost. `table`: Completely rebuilds the table from scratch on every run. `view`: Deploys the asset as a BigQuery SQL view (virtual table).
`[table_name].tags`	`array[string]`	No	`[sap, dataproduct]`	Metadata tags attached to the analytical asset in Dataform. These are arbitrary strings and not needed to be pre-registered; they can be used immediately for selective pipeline runs.
`[table_name].clusterDetails`	`map`	No	—	Optional. BigQuery clustering configuration. See Clustering details.
`[table_name].partitionDetails`	`map`	No	—	Optional. BigQuery partitioning configuration. See Partitioning details.

Advanced BigQuery configurations

Both styles share the same structure for optimizing BigQuery storage and query performance through Clustering and Partitioning.

Clustering details

Clustering co-locates data based on the values in specific columns. BigQuery sorts the data within each storage block using these columns, which dramatically speeds up queries that filter (WHERE) or join (JOIN) on them.

clusterDetails:
  columns: [bukrs, gjahr]

Parameter reference

Parameter	Type	Required	Example	Description
`columns`	`array[string]`	Yes	`[bukrs, gjahr]`	An ordered list of up to four column names to cluster the table by. Constraint: Columns must be alphanumeric and contain underscores only. The order of columns in the list determines the sorting hierarchy.

Partitioning details

Partitioning divides a large table into smaller, physical segments based on the values of a date, timestamp, or integer column. This prevents BigQuery from scanning the entire table when a query only requests a specific range of days, months, or IDs.

partitionDetails:
  column: budat
  partitionType: time
  timeGrain: day

Parameter reference

Parameter	Type	Required	Example	Description
`column`	`string`	Yes	`budat`	The column name used to partition the table. Must be alphanumeric and underscores only. The column type must match the `partitionType`.
`partitionType`	`string`	Yes	`time`	The partitioning strategy. Allowed Values: `time`: Partitions by a time-unit (Date, Timestamp, or Datetime column). `DATE`: Partitions explicitly by a Date column. `integer`: Partitions by an integer range.
`timeGrain`	`string`	No	`day`	Required if `partitionType` is `time` or `DATE`. Defines the granularity of the time partitions. Allowed Values: `hour`, `day`, `month`, `year` (case-insensitive).
`rangeStart`	`integer`	No	`1`	Required if `partitionType` is `integer`. The start value of the first partition (inclusive).
`rangeEnd`	`integer`	No	`1000`	Required if `partitionType` is `integer`. The end value of the last partition (exclusive).
`rangeInterval`	`integer`	No	`10`	Required if `partitionType` is `integer`. The width of each partition interval.

Examples

The following examples show configuration templates for both data foundation and data product modules, outlining how to customize target tables, optimize storage layout in BigQuery, and configure materialization types.

1. Custom data foundation table settings example

This example shows how to configure a foundation layer with clustered and partitioned transactional tables (like bseg and ekbe) alongside standard data tables:

# ==============================================================================
# S/4HANA-Specific Tables
# ==============================================================================
s4:
  # ACDOCA is a massive table in S/4HANA; clustering is vital
  - source:
      tableName: acdoca
    target:
      tags: [sap, s4, finance, transactional, hourly]
      clusterDetails:
        columns: [rclnt, rbukrs, gjahr]

# ==============================================================================
# ECC-Specific Tables
# ==============================================================================
ecc:
  - source:
      tableName: faglflexa
    target:
      tags: [sap, ecc, finance, transactional, hourly]

# ==============================================================================
# Common Tables (ECC & S/4HANA)
# ==============================================================================
common:
  # Financial document header (partitioned by posting date)
  - source:
      tableName: bkpf
      isCdc: true
    target:
      tags: [sap, common, finance, hourly]
      clusterDetails:
        columns: [bukrs, gjahr]
      partitionDetails:
        column: budat
        partitionType: time
        timeGrain: day

  # Purchasing document items (partitioned by creation date)
  - source:
      tableName: ekpo
    target:
      tags: [sap, common, logistics, purchasing, hourly]
      clusterDetails:
        columns: [mandt, ebeln]
      partitionDetails:
        column: aedat
        partitionType: time
        timeGrain: month

  # Standard master data table (no partitioning/clustering needed)
  - source:
      tableName: lfa1
    target:
      tags: [sap, common, masterdata, vendor, daily]

2. Custom data product table settings example

This example shows how to configure materialization types for downstream analytical data products. We set transactional sales_documents as incremental to optimize build performance and save costs, while non-transactional data tables like customers are built as standard tables:

# settings applied for both ECC and S/4HANA pipelines
common:
  # Transactional data product - incremental build
  sales_documents:
    materializationType: incremental
    tags: [sap, dataproduct, sales, transactional]
    clusterDetails:
      columns: [vkorg, vbeln]
    partitionDetails:
      column: audat
      partitionType: time
      timeGrain: day

  # Master data product - full table rebuild
  customers:
    materializationType: table
    tags: [sap, dataproduct, masterdata]
    clusterDetails:
      columns: [mandt, ktokd]

  # Aggregated reporting view - virtual view
  sales_performance_summary:
    materializationType: view
    tags: [sap, dataproduct, sales, reporting]

How-to guides

This section provides step-by-step guides for common configuration tasks and custom deployment scenarios.

Customize table scope in a data foundation module

To add or remove tables within an existing data foundation module without creating new modules or running separate pipeline instances:

Copy the default table_settings.default.yaml configurations into your workspace config directory (for example, config/cortex/data_foundation/sap/table_settings.yaml).
In your new file, add your custom tables or remove unused standard tables under the ecc, s4, or common keys as required:

common:
  - source:
      tableName: custom_table_name
    target:
      tags: [custom_tag]

Update config/config.yaml to reference the path of your custom table settings under the module's tableSettings property:

data:
  modules:
    foundation:
      - moduleId: erp
        type: cortex.sap
        # Link to the custom table settings file:
        tableSettings: config/cortex/data_foundation/sap/table_settings.yaml

Configure multiple instances of a data foundation module

To deploy two or more separate pipeline instances of the same module type (for example, supporting multiple SAP instances, to segment tables, isolate environments, or target different target datasets):

Before you begin: * Ensure the source tables exist in your source raw dataset. * Ensure the target dataset schemas are configured. * When working with SAP data foundation modules, verify that the metadata table DD03L contains columns and descriptor information for the custom tables you intend to ingest. See SAP ERP requirements for details.

Instructions:

In the config/config.yaml file, add target configurations under data.targets to define target datasets for each pipeline instance:

data:
  targets:
    - id: data_foundation_core
      projectId: target_project_id
      datasetId: data_foundation_sap_core
    - id: data_foundation_custom
      projectId: target_project_id
      datasetId: data_foundation_sap_custom

Define multiple instances of the module under the data.modules.foundation list. Give each instance a unique moduleId, its own target dataset IDs, and optionally tableSettings configuration:

data:
  modules:
    foundation:
      # Core SAP ERP foundation module instance
      - moduleId: erp_core
        type: cortex.sap
        dataSourceId: sap_raw
        dataTargetId: data_foundation_core
        tableSettings: "cortex-framework-core/src/data_modules/cortex/data_foundation/sap/table_settings.default.yaml"
      # Custom tables pipeline instance
      - moduleId: erp_custom
        type: cortex.sap
        dataSourceId: sap_raw
        dataTargetId: data_foundation_custom
        tableSettings: "config/cortex/data_foundation/sap/table_settings_custom.yaml"

Create the config/cortex/data_foundation/sap/table_settings_custom.yaml file specifying the custom scope. E.g.:

common:
  - source:
      tableName: custom_sap_table_name
    target:
      tags: [sap, s4, hourly]
      clusterDetails:
        columns: [carrid, connid]
      partitionDetails:
        column: fldate
        partitionType: time
        timeGrain: day

Apply the changes by running the deployment script (uv run cortex-build-and-deploy), then execute the Dataform actions as described in Post-deployment steps.

Deployment configuration

Configuration file: config/config.yaml

Build environment

Data section overview

Data: BigQuery location

Data: Cortex namespace

Data: BigQuery sources and target datasets

Data: Modules

Data: Modules: Foundation

Data: Modules: Products

Deployment environment

Configuration file: table_settings.yaml

Dynamic dependency resolution

Configuration paths

Configuration styles

Data foundation style

YAML syntax example

Parameter reference

Source settings

Target settings

Data product style

YAML syntax example

Parameter reference

Advanced BigQuery configurations

Clustering details

Parameter reference

Partitioning details

Parameter reference

Examples

1. Custom data foundation table settings example

2. Custom data product table settings example

How-to guides

Customize table scope in a data foundation module

Configure multiple instances of a data foundation module

Configuration file: `config/config.yaml`

Configuration file: `table_settings.yaml`