Cloud Data Fusion release notes

This page documents production updates to Cloud Data Fusion. Check this page for announcements about new or updated features, bug fixes, known issues, and deprecated functionality.

You can see the latest product updates for all of Google Cloud on the Google Cloud page, browse and filter all release notes in the Google Cloud console, or programmatically access release notes in BigQuery.

To get the latest product updates delivered to you, add the URL of this page to your feed reader, or add the feed URL directly.

September 09, 2025

Change

The Salesforce plugin version 1.7.0 is available in Cloud Data Fusion version 6.8.0 and later. This release includes the following change:

  • Upgrade of Salesforce Bulk API V1 version from 62.0 to 64.0 (PLUGIN-1926).

Salesforce has deprecated certain fields in the API version 64.0. Upgrading to Salesforce plugin version 1.7.0 might cause pipelines that use these fields, to fail. To ensure your pipelines continue to work, you must manually update your pipeline schema to either load a new schema or remove the deprecated fields. For more information, see Prerequisites for upgrading to Salesforce plugin version 1.7.0.

August 28, 2025

Change

The ServiceNow plugin version 1.2.7 is available in Cloud Data Fusion version 6.10.1. This release includes the following change:

  • Fixed an issue related to schema backward compatibility while upgrading from plugin version 1.1.0 (PLUGIN-1902).

August 27, 2025

Feature

Cloud Data Fusion version 6.11.1 is generally available (GA). This release includes the following features:

Change

Changes in Cloud Data Fusion 6.11.1:

  • The Java runtime environment is upgraded from Java 8 to Java 11 (CDAP-21184).
  • To create ephemeral Dataproc clusters, Cloud Data Fusion uses the Dataproc 2.3 image by default (CDAP-21187).
  • The pipeline JSON size limit for creating new pipelines and importing pipelines as JSON is increased to 5MB (previously 2MB) (CDAP-21194).
  • On the Pipeline details page, the inbound triggers sidebar features a paginated list of pipelines where you can select the pipelines you want to add to the trigger. Additionally, a refresh button is added to update the existing list of triggers and pipelines (CDAP-21195).
Deprecated

Dataproc 2.0 is no longer supported in Cloud Data Fusion version 6.11.1 and later.

July 16, 2025

Change

The Oracle plugin version 1.12.3 is available in Cloud Data Fusion (via Hub) versions 6.11.0 and later, and 1.11.8 is available in Cloud Data Fusion (via Hub) version 6.10.

This release provides backward compatibility for recent schema changes, including the following:

To address backward compatibility for these changes, two new hidden fields are introduced in Oracle batch source configurations: treatPrecisionlessNumAsDeci and treatAsOldTimestamp. Both flags default to false. To enable these flags, edit the respective values in your exported connection JSON (if using connections) or pipeline JSON (if not using connections) before re-importing or re-deploying (PLUGIN-1893).

June 12, 2025

Change

The Elasticsearch plugin version 1.11.0 is available in Cloud Data Fusion version 6.11.0. This release includes the following change:

  • Upgraded Hadoop version for Elasticsearch plugin compatibility (PLUGIN-1881).

June 11, 2025

Change

The HTTP plugin version 1.4.4 is available in Cloud Data Fusion version 6.10.1. This release includes the following changes:

  • Implemented the Client Credentials Grant flow for HTTP OAuth2, enabling authorized clients to securely access data using the client_credentials grant type. Client credentials can be passed through Basic Authentication header, in the request body, or as query parameters (PLUGIN-1872).

  • Fixed an issue causing the HTTP Source plugin to throw a NullPointerException when the BasePageIterator received a null response (PLUGIN-1894).

June 09, 2025

Change

Cloud Data Fusion is available in the northamerica-south1 (Mexico) region. For more information, see Pricing.

May 16, 2025

Change

The DB plugin versions 1.11.7 and 1.12.2 are available in Cloud Data Fusion versions 6.10.1 and 6.11.0 respectively. This release includes the following change:

  • Added the TRANSACTION_ISOLATION_LEVEL property to DB Plugins (PostgreSQL, MySQL, and MSSQL).

March 28, 2025

Change

The Python Transform plugin (version 2.3.2) is available in CDAP version 6.10.1. This includes bug fix for the deprecated PROTOCOL_TLSv1 on Dataproc 2.0 and later. The issue occurs when earlier TLS versions, such as TLSv1 and TLSv1.1 are disabled by default due to security concerns. Applications relying on ssl.PROTOCOL_TLSv1 in Python might fail and requires updates to use ssl.PROTOCOL_TLSv1_2 or later.

March 17, 2025

Feature

You can view instance metrics and pipeline metrics in Cloud Monitoring and in the dashboard provided by Cloud Data Fusion. For more information, see Metrics overview and Monitor Cloud Data Fusion system, instance, and pipeline health.

Feature

When a pipeline run fails, you can retrieve detailed error information on the pipeline details page of the Cloud Data Fusion web interface.

Cloud Data Fusion classifies pipeline errors by category, reason, and message. This classification speeds up resolution and reduces the need to examine complex logs. For more information, see Retrieve error information for a failed pipeline run.

Feature

Cloud Data Fusion 6.11.0 offers high availability with reduced upgrade downtime.

January 29, 2025

Fixed

The SAP OData plugin version 0.11.6 is available in Cloud Data Fusion version 6.8.0 and later. This release includes the following change:

Fixed an issue causing pipeline deployments to fail due to SAP memory dumps when processing large datasets with macro filters: ERROR Stage 'SAP OData' encountered : CDF_SAP_ODATA_01534 - Service validation failed. Root Cause:Invalid parametertype used at function.

January 21, 2025

Feature

You can use custom constraints with Organization Policy to provide more granular control over specific fields for some Cloud Data Fusion resources. For more information, see Create custom organization policy constraints.

January 13, 2025

Change

The SAP SuccessFactors plugin version 1.2.4 is available in Cloud Data Fusion version 6.8.0 and later. This release lets you use OAuth 2.0 for ODATA API authentication (PLUGIN-1741).

December 19, 2024

Fixed

The Cloud Data Fusion version 6.10.1.2 patch revision is generally available (GA). 6.10.1.2 includes the following changes:

  • You can generate audit logs that record data plane activities within your Cloud Data Fusion instance. Data plane audit logging is available in Preview for RBAC-enabled instances.

  • To improve the API response time, by default, all program activity records older than 30 days are cleaned up. Any activity older than 30 days isn't visible in the Cloud Data Fusion studio (CDAP-14950).

  • When using role-based access control, performing the List Pipelines operation requires datafusion.pipelines.list permission, in addition to datafusion.namespaces.get permission. For more information, see RBAC roles and permissions (CDAP-20931).

  • Fixed an issue causing the flow control metric, flowcontrol.launching.count, to overcount in cases where servers were restarted when a pipeline run was in progress (CDAP-21046).

  • Fixed an issue causing the flow control metric, flowcontrol.launching.count, to be stale after a restart when no pipelines were running (CDAP-21048).

  • Fixed an issue causing the default max concurrent runs limit for triggers not to appear in the web interface, making it difficult to tell if triggers were working as intended (CDAP-21072).

  • Fixed an issue causing the top panel of the Studio tab to disappear when you edited a pipeline draft that's based on a pipeline from an earlier Cloud Data Fusion version (CDAP-21073).

  • Improved performance by removing a call to the list apps API during pipeline deployment when checking if a pipeline already exists (CDAP-21074).

December 17, 2024

Feature

Cloud Data Fusion supports the CMEK organization policy.

November 27, 2024

Fixed

The Cloud SQL MySQL plugins version 1.11.5 is available in Cloud Data Fusion versions 6.10.0 and later. This release fixes an issue in the Cloud SQL MySQL sink causing pipelines to fail when the schema contains a MySQL reserved word (PLUGIN-1017).

November 18, 2024

Fixed

The SAP table batch source plugin version 0.11.5 is available in Cloud Data Fusion version 6.8.0 and later. This release fixes an issue causing the following error: Error encountered while configuring the stage: Unable to access Cloud Storage or download JCo libraries from Cloud Storage.

Fixed

The Cloud SQL MySQL plugins version 1.11.5 is available in Cloud Data Fusion versions 6.8.0 and later. This release fixes an issue in the Cloud SQL MySQL sink causing pipelines to fail when the schema contains a MySQL reserved word (PLUGIN-1017). This note is incorrect; see entry for November 27, 2024.

September 26, 2024

Fixed

The SAP ODP batch source plugin version 0.11.3 is available in Cloud Data Fusion versions 6.8.0 and later. This release includes the following changes:

  • Fixed an issue causing the following error: Error encountered while configuring the stage: Unable to access Cloud Storage or download JCo libraries from Cloud Storage. To address the issue, you must upgrade the Cloud Storage client library to version 2.3.0 or later.

  • Fixed an issue causing memory errors in the SAP system. You can choose to load changed data without loading historical data first. You can select this option in the plugin properties.

September 06, 2024

Fixed

The CloudSQL MySQL plugin version 1.10.7 is available in Cloud Data Fusion versions 6.9.0 and 6.10.0. This plugin version lets you use a macro to specify the name of the CloudSQL instance in the plugin's Connection name field.

August 30, 2024

Fixed

Excel plugin version 2.11.5 is available in Cloud Data Fusion 6.9 versions. This version fixes an issue in the Excel batch source causing pipelines with large XLSX files to consume high memory and fail (PLUGIN-1771 and PLUGIN-1795).

Fixed

Excel plugin version 2.10.3 is available in Cloud Data Fusion 6.8 versions. This version fixes an issue in the Excel batch source causing pipelines with large XLSX files to consume high memory and fail (PLUGIN-1771 and PLUGIN-1795).

July 15, 2024

Fixed

The Cloud Storage Copy/Move plugin version 0.23.2, which is bundled with Google Cloud Platform plugin, is available in Cloud Data Fusion versions 6.10.0 and later. The release lets you use a wildcard character (*) in the source path to copy and move multiple files. For example, the source path gs://demo0/prod/reports/*.csv copies and moves all CSV files in the reports directory (PLUGIN-698).

June 28, 2024

Fixed

The Cloud Storage Multi File sink plugin version 0.23.2 is available in Cloud Data Fusion version 6.10.1 and later. The release fixes an issue in the Cloud Storage Multi File sink causing pipelines to fail when the Flexible schema property was set to true (PLUGIN-1780).

June 20, 2024

Fixed

The Oracle sink plugin version 1.10.7 is available in Cloud Data Fusion version 6.9. The release fixes an issue in the Oracle sink causing null values to be assigned to fields in the input schema that have lowercase letters in the field name (PLUGIN-1793).

May 21, 2024

Feature

Syncing multiple pipelines from a namespace is GA in Cloud Data Fusion version 6.10.1, For more information, see Sync Cloud Data Fusion pipelines with a remote repository.

Breaking

Cloud Data Fusion version 6.10.1 has a known issue in the Cloud Storage plugin causing pipelines to intermittently fail if the plugin contains a * regex pattern and uses Dataproc 2.0. To mitigate this issue:

Fixed

The SAP SuccessFactors batch source version 1.2.3 is available in the Enterprise edition of Cloud Data Fusion 6.7.0 and later. This release lets you configure a proxy URL and SuccessFactors authentication properties.

April 26, 2024

Fixed

The HTTP plugin (version 1.4.2) is available in Cloud Data Fusion versions 6.8.0 and later. The release fixed an issue in the HTTP source causing an error in the retrieved schema when one of the retrieved columns contained a quoted value with a delimiter, such as a comma (PLUGIN-1781).

March 26, 2024

Feature

The Amazon Redshift batch source connector version 1.11.1 is available in Preview in Cloud Data Fusion 6.10.0 and later. This source lets you load batch data from your Redshift dataset to a destination, such as BigQuery.

Change

Cloud Data Fusion is available in the following regions:

  • asia-south2
  • me-central2

For more information, see Pricing.

March 18, 2024

Fixed

The SAP Ariba batch source plugin version 1.3.0 is available in the Cloud Data Fusion Enterprise edition, versions 6.7.0 and later. In plugin version 1.3.0, if a failure occurs, the plugin retries API calls. You can configure the retry parameters using the hidden fields in the pipeline configuration JSON.

February 29, 2024

Deprecated

Cloud Data Fusion version 6.7 is no longer supported. You should upgrade your instances to run in a supported version. For instructions, see Manage version upgrades for instances and pipelines.

January 27, 2024

Change

Cloud Data Fusion lets you enable and disable Dataplex Lineage, as needed. When you create a new instance in Cloud Data Fusion version 6.8.0 and later, Dataplex Lineage is disabled by default. For more information, see View lineage in Dataplex.

January 16, 2024

Feature

Cloud Data Fusion version 6.10.0 is available in Preview. This release is in parallel with the CDAP 6.10.0 release.

Feature

Source control management using GitHub is generally available (GA) in Cloud Data Fusion version 6.10.0. With this feature, you can use GitHub to maintain version histories of your ETL and ELT pipelines.

To simplify the experience of synchronizing pipelines between Cloud Data Fusion and GitHub in bulk, pushing and pulling multiple pipelines is available in Preview.

Fixed

Fixed in Cloud Data Fusion 6.10.0:

  • Fixed an issue in the Postgres DB plugin causing macros to be unsupported for database configuration (PLUGIN-1681).
  • Fixed an issue causing slowness in the API while fetching runs for all applications in a namespace (CDAP-20587).
  • Made the following fixes to Wrangler grammar (CDAP-20839):
    • The NUMERIC token type supports negative numbers.
    • The PROPERTIES token type supports one or more properties.
  • Fixed an issue causing columns that have all null values to be dropped in Wrangler (CDAP-20521).
  • Fixed an issue causing pipeline upgrades to not have the intended description (CDAP-20815).
Deprecated

Dataproc 1.5 isn't supported in Cloud Data Fusion version 6.10.0.

December 22, 2023

Fixed

The Salesforce plugin version 1.6.2 is available in Cloud Data Fusion versions 6.8.0 and later. This version includes the following changes:

  • Fixed an issue in the Salesforce plugin causing the following error in some pipelines that run more than 4 hours: java.lang.IllegalStateException: SSLException reading next record: javax.net.ssl.SSLException: Connection reset. The Connection timeout property was added to the Salesforce plugin properties in the web interface with the default value of 3600 seconds (PLUGIN-1719).

  • For accuracy, fixed schema handling for referenced object fields: child fields are explicitly marked as non-nullable, regardless of the schema values in the referenced object.

In earlier versions, when retrieving schema information for fields in referenced objects, such as contact.account_lastmodifieddate, the schema inherits properties from the referenced object, causing incorrect non-nullable assumptions (PLUGIN-1720).

  • A retry mechanism was added in the Salesforce batch source and Multi-Source plugins for connection timeout issues (PLUGIN-1706).

December 19, 2023

Fixed

The SAP ODP batch source plugin version 0.11.0 is available in Cloud Data Fusion versions 6.8.3 and later. The release includes the following changes:

  • Improved error logging, providing more information about the occurrence and mode of errors.
  • Improved the placeholder values in the Filter options fields.
  • Added support for fetching Operational Delta Queue (ODQ) fields by turning on the Fetch delta fields toggle.
  • On the Properties page for the plugin, changed the name of the ObjectType field to Context. For more information, see the ODP batch source properties.

December 14, 2023

Fixed

The Cloud Data Fusion version 6.9.2.2 patch revision is generally available (GA). 6.9.2.2 includes the following fixes:

  • Increased the speed of the batch /runs API call for pipelines that are run thousands of times (CDAP-20587).
  • Fixed an issue causing draft pipelines to load incorrectly when you enable an accelerator that automatically installs plugins from a custom hub (CDAP-20628).
  • Fixed an issue causing the CDAP service IP to be cached forever (CDAP-20781).
  • Fixed an issue causing a KubeTwillRunnerService error on shutdown (CDAP-20792).
  • Fixed an issue causing pipelines and streaming jobs to fail after an AppFabric restart. (CDAP-20797).
  • Fixed an issue causing a slowdown in deploying applications (CDAP-20820).
  • Fixed an issue causing refreshed OAuth tokens to not be logged as expected, making it difficult to identify the root cause of some issues (CDAP-20861).
  • Fixed an issue in replication where you couldn't select the Tink transformation in the web interface. (CDAP-20804).

October 30, 2023

Fixed

The Cloud Data Fusion version 6.8.3.1 patch revision is generally available (GA). It fixes a regression that causes a pipeline to fail when using Dataproc secondary workers (CDAP-20807).

October 18, 2023

Change

The Cloud Data Fusion SAP SLT No RFC Replication plugin version 0.11.0 is available in the Hub in Cloud Data Fusion enterprise edition versions 6.8.0 and later. It differs from the existing SAP SLT Replication plugin in the following ways:

  • All data and metadata file formats are in JSON.
  • No SAP RFC inbound calls occur in the SAP SLT No RFC Replication plugin. Accessing schemas and data from the SAP system no longer requires an SAP connection. Metadata and data extraction are sourced from the Cloud Storage bucket.

September 07, 2023

Feature

Cloud Data Fusion version 6.9.2 is generally available (GA). This release is in parallel with the CDAP 6.9.2 release.

Feature
Change

Changes in Cloud Data Fusion 6.9.2:

  • Cloud Data Fusion supports setting custom scopes when creating a Dataproc cluster (CDAP-19428).
  • You can set common metadata labels for Dataproc clusters and jobs using the Common Labels property in the Ephemeral Dataproc compute profile (CDAP-20698).
  • You can set labels for the Dataproc jobs using the Common Labels property in the Existing Dataproc compute profile (CDAP-20698).
  • You can set a pipeline runtime argument with the key system.profile.properties.labels and a value representing the labels in the following format: key1|value1;key2|value2. This setting overrides the common labels set in the compute profile for pipeline runs (CDAP-20698).
  • Cloud Data Fusion supports using Dataproc temp buckets in compute profiles (CDAP-20712).

September 06, 2023

Change

The SAP ODP plugin version 0.7.5 is available in Cloud Data Fusion versions 6.6.0 to 6.8.0. This release includes the following changes:

  • Fixed an issue causing duplicate records or loss of records due to package acknowledgement occurring too early.
  • Filters that you apply are viewable in the logs.

July 20, 2023

Feature

The Cloud Data Fusion SAP ODP plugin supports extracting data through CDS views.

July 13, 2023

Fixed

The SAP OData plugin (version 0.9.1) is available in the Cloud Data Fusion SAP Hub (all versions) with the following changes:

  • Fixed an issue in the SAP OData batch source causing you not to receive a valid error message if the base URL provided is invalid.
  • A warning has been added to the log message when you provide a batch size that is larger than the maximum allowed batch size.

June 14, 2023

Feature

Features in Cloud Data Fusion 6.9.1:

  • Cloud Data Fusion supports using Source Control Management to manage pipeline versions through GitHub repositories. Source Control Management is available in Preview (CDAP-20228).

  • Data Catalog Asset Lineage Integration is in GA in versions 6.8.0 and later. In version 6.9.1, it supports the Multiple Database Tables source and the BigQuery Multi Table sink.

  • Cloud Data Fusion supports editing deployed pipelines (CDAP-19425).

  • Cloud Data Fusion supports Window Aggregation operations in Transformation Pushdown to reduce the pipeline execution time by performing SQL operations in BigQuery instead of Spark (CDAP-19628).

  • Cloud Data Fusion supports specifying filters in SQL in Wrangler and the pushdown of SQL filters in Wrangler to BigQuery. In the Wrangler transformation, added support for specifying preconditions in SQL, and added support for transformation pushdown for SQL preconditions. For more information, see Wrangler Filter Pushdown (CDAP-20454).

  • Cloud Data Fusion supports Dataproc driver node groups. To use Dataproc driver node groups, when you create the Dataproc cluster, configure the following properties:

    • yarn:yarn.nodemanager.resource.memory.enforced=false
    • yarn:yarn.nodemanager.admin-env.SPARK_HOME=$SPARK_HOME
  • For the Multiple Database Tables Batch Source, added field-level lineage support (CDAP-20440).

  • Cloud Data Fusion version 6.9.1 supports the Dataproc image 2.1 compute engine, which runs in Java11. If you change the Dataproc image to 2.1, the JDBC drivers that the database plugins use in those instances must be compatible with Java11 (CDAP-20543).

  • Cloud Data Fusion supports the following improvements and changes for real time pipelines with a single Pub/Sub streaming source and no Windower plugins:

    • The Pub/Sub streaming source has built-in support—data is processed at least once.
    • Enabling Spark checkpointing isn't required. Pub/Sub streaming source creates a Pub/Sub snapshot at the beginning of each batch and removes it at the end of each batch.
    • The Pub/Sub Snapshot creation has a cost associated with it. For more information, see Pub/Sub pricing.
    • The snapshot creations can be monitored using Cloud Audit logs.

    For more information, see Read from a Pub/Sub streaming source (PLUGIN-1537).

Change

Changes in Cloud Data Fusion 6.9.1:

  • Updated Cloud Data Fusion docker image dependencies to include fixes for security vulnerabilities.

  • Added the ability to configure Java options for a pipeline run by setting the system.program.jvm.opts runtime argument (CDAP-20381).

  • Replication pipelines generate logs for stats of events processed by source and target plugins at a fixed interval (CDAP-20140).

  • Streaming pipelines that use Spark checkpointing can use macros if the cdap.streaming.allow.source.macros runtime argument is set to true. Note that macro evaluation will only be performed for the first run in this case, then stored in the checkpoint. It will not be re-evaluated in later runs (CDAP-20455).

  • Improved performance of replication pipelines by caching schema objects for data events (CDAP-20488).

  • Added a launch mode setting to the Dataproc provisioners. When set to Client mode, the program launcher will run in the Dataproc job itself, and not as a separate YARN application. This reduces start-up time and cluster resources required, but may cause failures if the launcher needs more memory, such as when there's an action plugin that loads data into memory (CDAP-20500).

  • Removed duplicate backend calls when a program reads from the secure store (CDAP-20504).

  • Added support to upgrade Pipeline Post-run Action (Pipeline Alerts) plugins during the pipeline upgrade process (CDAP-20567).

  • Added Lifecycle microservices endpoint to delete a streaming application state for Kafka Consumer Streaming and Google Cloud Pub/Sub Streaming sources (CDAP-20466).

Deprecated

With the introduction of editing deployed pipelines in Cloud Data Fusion 6.9.1, the behavior of some APIs have significantly changed. Due to these changes, some APIs are deprecated (CDAP-20030).

June 08, 2023

Feature

Cloud Data Fusion 6.8.3 supports the ability to configure Java options for a pipeline run by setting the system.program.jvm.opts runtime argument (CDAP-20381).

June 07, 2023

Feature

Zendesk plugins version 1.2.0 is available in the Cloud Data Fusion Hub. The following changes are included in version 1.2.0:

June 02, 2023

Feature

The SAP SuccessFactors Batch Source plugin is GA. You can connect your data pipeline to an SAP SuccessFactors Source and a BigQuery Sink with this plugin in Cloud Data Fusion versions 6.5.1 and later.

April 26, 2023

Feature

Cloud Data Fusion version 6.8.2 is generally available (GA). This release is in parallel with the CDAP 6.8.2 release.

March 29, 2023

Fixed

In Cloud Data Fusion version 6.8.1, Dataproc clusters no longer require the following OAUTH scope to function: https://www.googleapis.com/auth/cloud-platform.

February 28, 2023

Feature

Cloud Data Fusion version 6.7.3 is generally available (GA). This release is in parallel with the CDAP 6.7.3 release.

February 23, 2023

Feature

FTP Plugins versions 3.1.0 and 3.2.0 are generally available (GA) in Cloud Data Fusion versions 6.7.2+ and 6.8.0+, respectively. They include support for more file formats and properties. An issue was fixed in the FTP Batch Source that caused pipelines to fail when running with Dataproc 2.0. For more information, see the CDAP Hub release log.

January 05, 2023

Feature

The SAP SuccessFactors Batch Source plugin is available in Preview. You can connect your data pipeline to an SAP SuccessFactors Source and a BigQuery Sink with this plugin in Cloud Data Fusion versions 6.5.1 and later.

December 06, 2022

Feature

Features in 6.8.0:

August 15, 2022

Feature

Cloud Data Fusion version 6.7.1 is generally available (GA). This release is in parallel with the CDAP 6.7.1 release.

June 09, 2022

Change

Changes in 6.7.0:

  • Increased pipeline launch and run scalability in Enterprise instances.
  • In Transformation Pushdown, added the ability to use existing connections.
  • Added the ability to parse files before loading data into a Wrangler workspace.
  • Added the ability to import the schema in JSON and some Avro formats, where schema inference isn't possible before loading data into the Wrangler workspace.
  • In Connection Management:
    • Added the ability to edit connections.
    • Added support for connections for several plugins and sinks.
    • Added the ability to browse partial hierarchies, such as BigQuery datasets and Dataplex zones.
  • In the Cloud Storage Done File Marker Post-Action plugin, added support for the Location property, which lets you have buckets and customer-managed encryption keys in locations that are not US locations.
  • In the BigQuery Execution Action plugin and the BigQuery Argument Setter action plugin, added support for the Dataset Project ID property, the Project ID of the dataset that stores the query results. It's required if the dataset is in a different project than the BigQuery job.
  • In BigQuery sinks, added support for the BigNumeric data type.
  • In the BigQuery Table Batch Source, added the ability to query any temporary table in any project when you set the Enable querying views property to Yes. Previously, you could only query views.
  • In Cloud Data Loss Prevention plugins, added support for templates from other projects.
  • Added a new pipeline state for when you manually stop a pipeline run: Stopping.
  • In the BigQuery Execute plugin, added the ability to look up the drive scope for the service account to read from external tables created from the drive.
  • Improved the generic Database source plugin to correctly read decimal data.
  • Improved the Google Cloud Platform plugins to validate the Encryption Key Name property.
  • In the replication configurations, added the ability to enable soft deletes from a BigQuery target.
  • In Wrangler, added support for nested arrays, such as the BigQuery STRUCT data type.
  • In the Cloud Storage File Reader Batch Source plugin, added the Allow Empty Input property.
  • In the Cloud Storage File Reader Batch Source and Amazon S3 Batch Source plugins, added the Enable Quoted Values property, which lets you treat content between quotes as a value.
  • In the Joiner transformation, added the Input with Larger Data Skew property.
  • Behavior change: In the Pipeline Studio, if you click Stop on a running pipeline and the pipeline doesn't stop after 6 hours, the pipeline is forcefully terminated.
  • Behavior change: In the Deduplicate Analytics plugin, limited the Filter Operation property to one record. If this property isn't set, a random record is chosen from the group of duplicate records.
  • Behavior change: The BigQuery sink supports Nullable Arrays. A NULL array is converted to an empty array at insertion time.

May 23, 2022

Change

Google Cloud Platform Plugins version 0.19.1 is generally available (GA). This version includes Dataplex Source and Sink plugins in Preview. For more information, see the CDAP Hub release log.

April 01, 2022

Feature

(Release note added March 14, 2023) Role-based access control (RBAC) is generally available (GA) in Cloud Data Fusion 6.6.0 and later. This gives administrators fine-grained access control over what users can do at the namespace level.

March 31, 2022

Feature

The SAP SLT Replication plugin is generally available (GA). You can replicate your data continuously and in real time from SAP sources into BigQuery with this plugin in Cloud Data Fusion versions 6.4.0 and later.

February 25, 2022

Feature

Features in 6.6.0:

  • Cluster reuse is generally available (GA).
  • Predefined autoscaling is available in Preview.
  • Cloud Data Fusion flow control prevents you from submitting too many requests, which can cause stuck or failed pipeline runs. It applies to API and scheduled pipeline launch requests for batch and real-time pipelines and replication jobs. It is available in Preview.

November 17, 2021

Change

January 7, 2022 correction: Cloud Data Fusion is not yet available in the Santiago (southamerica-west1) region. For available locations, see Locations.

Cloud Data Fusion is now available in the Santiago (southamerica-west1) region.

November 05, 2021

Feature

GA: Cloud Data Fusion now supports Customer-Managed Encryption Keys (CMEK), which provides user encryption control over the data written to Google internal resources in tenant projects, and data written by Cloud Data Fusion pipelines. The list of supported plugins has also expanded.

August 16, 2021

Change

SQL Server source plugin version 1.5.5 is now available. This version fixes a NullPointerException bug that occurs in version 1.5.4. Versions 1.5.4 and above support the Datetime data type. In versions 1.5.3 and earlier, if you had a Datetime column in your SQL Server source, it mapped to the Timestamp data type. Upgrades to version 1.5.4 are backwards incompatible, but upgrades to version 1.5.5 are compatible. For more information, see Troubleshooting and the CDAP SQL Server Batch Source.

May 27, 2021

Feature

In Cloud Data Fusion version 6.4.1, Replication supports the Datetime data type in BigQuery targets. You can now read and write to tables that contain Datetime fields.

March 31, 2021

Feature

Cloud Data Fusion version 6.4.0 is now available. To upgrade, see Upgrading instances and pipelines. This release is in parallel with the CDAP 6.4.0 release.

Feature

Features in 6.4.0:

  • GA: You can now ingest data from SAP tables with the SAP Table Batch Source plugin.

  • Cloud Data Fusion now supports the Datetime data type in the following plugins. You can now read and write to tables that contain Datetime fields:

    • BigQuery batch source
    • BigQuery sink
    • BigQuery multi table sink
    • Bigtable batch source
    • Bigtable sink
    • Datastore batch source
    • Datastore sink
    • Cloud Storage file batch source
    • Cloud Storage file sink
    • Cloud Storage multi file sink
    • Spanner batch source
    • Spanner sink
    • File source
    • File sink
    • Wrangler
    • Amazon S3 batch source
    • Amazon S3 sink
    • Database source
  • You can configure machine type, cluster properties, and idle TTL for the Dataproc provisioner. For the available settings, see the CDAP documentation.

  • Adding, editing, and deleting comments on draft data pipelines is now supported. For more information, see Adding comments to a data pipeline.

  • Advanced join conditions are now available in the Joiner plugin. You can specify an arbitrary SQL condition to join on. For more information, see Join Condition Type.

  • A new post-action plugin is now available: Cloud Storage Done File Marker. To help you orchestrate downstream/dependent processes, this post-action plugin marks the end of a pipeline run by creating and storing an empty SUCCESS file in the given Cloud Storage bucket upon a pipeline completion, success, or failure.

Change

Changed in version 6.4.0:

  • Behavior change: When you validate a plugin, macros get resolved with preferences. In previous releases, to validate a plugin's configuration, you had to change the pipeline to remove the macros.
  • Behavior change: Cloud Data Fusion now determines the schema dynamically at runtime instead of requiring arguments to be set. Multi sink runtime argument requirements have been removed, which lets you add simple transformations in multi-source/multi-sink pipelines. In previous releases, multi-sink plugins require the pipeline to set a runtime argument for each table, with the schema for each table.

  • You can now filter tables in the Multiple Database Tables Batch Source.

  • Multiple Database Batch Source and BigQuery multi-table sink have better error handling and let pipelines continue if one or more tables fail.

  • Cloud Data Fusion Replication changes:

    • March 17, 2023 release note addition: Replication from SQL Server and MySQL is generally available (GA) in Cloud Data Fusion version 6.4.0.
    • Renamed Replication pipelines to Replication jobs.
    • The Customer-managed encryption key (CMEK) configuration property is now available for BigQuery targets in your Replication jobs.
    • On the BigQuery Target properties page, renamed the Staging Bucket Location property to Location.
    • Improved reliability by restarting Replication from the last known checkpoint.
  • You can now use files with ISO-8859, Windows and EBCDIC encoding types with Amazon S3, File and Cloud Storage File Reader batch source plugins.

  • Cloud Data Fusion now supports running pipelines on a Hadoop cluster with Kerberos enabled.

March 24, 2021

Feature

Cloud Data Fusion version 6.3.1 is now available. This version fixes a race condition that results in intermittentant failures in concurrent pipeline executions. This release is in parallel with the CDAP 6.3.1 release.

February 22, 2021

Deprecated

Cloud Data Fusion Beta instances (versions 6.1.0.2 and lower that were created before November 21, 2019) will be turned down on March 1, 2021. Instead, export your pipeline, delete the old instance to avoid billing impact, create a new instance, and import your pipeline into the new instance.

January 27, 2021

Announcement

Cloud Data Fusion Beta instances (versions 6.1.0.2 and lower that were created before November 21, 2019) will be turned down on March 1, 2021. Instead, export your pipeline, create a new instance, and import your pipeline into the new instance. This note is incorrect; see entry for February 18, 2021.

January 21, 2021

Announcement

Cloud Data Fusion 6.3.0 is now available.

Change

In-place upgrades are now supported for minor and patch versions.

October 21, 2020

Issue

In Cloud Data Fusion versions before 6.2, there is a known issue where pipelines get stuck during execution. Stopping the pipeline results in the following error: Malformed reply from SOCKS server. To fix this, delete the Dataproc cluster, and then update the memory settings in the compute profile.

September 30, 2020

Change

Wrangler now supports BigQuery views and materialized views.

Change

Improved performance for skewed joins by including Distribution in the Joiner plugin settings.

Change

Cloud Data Fusion now displays the number of pending preview runs, if any, before the current run. In the Studio, the number of pending runs is displayed under the timer.

August 24, 2020

Change

Highlights

Cloud Data Fusion 6.1.4 provides performance and scalability improvements that increase developer productivity and optimize pipeline runtime performance. The release includes scaled-up previews that support up to 50 concurrent runs, capabilities to handle large and complex schemas in Pipeline Studio, an enhanced log viewer, and other critical improvements and fixes.

This release is in parallel with the CDAP 6.1.4 release.

Feature

You can now create autoscaling Dataproc clusters.

Change

Cloud Data Fusion previews now support up to 50 concurrent runs.

Fixed

Fixed a bug where a metric incorrectly counted the number of the records written in the Google Cloud Storage sink.

April 22, 2020

Fixed

Fixed a bug that caused zombie processes when using the Remote Hadoop Provisioner.

Fixed

Fixed a race condition that caused a failure when running a Spark program.

Fixed

Reduced preview startup by 60%. Also added limit to max concurrent preview runs (10 by default).

November 21, 2019

Feature

Added support for creating Cloud Data Fusion instances that use private IP addresses.

Feature

Added support for creating private Cloud Data Fusion instances and executing data pipelines in a VPC-SC environment.

Change

The Cloud Data Fusion UI is now available at a different URL in the format: <instance-name>-<project-id>-dot-<region identifier>.datafusion.googleusercontent.com.

May 31, 2019

Change

Renamed "Cloud Dataprep service" to "Wrangler service" in the System Admin page of the Cloud Data Fusion UI.