This page describes upgrading the version of your instances or batch pipelines.
Upgrade your Cloud Data Fusion instances and batch pipelines to the latest platform and plugin versions for the latest features, bug fixes, and performance improvements.
Before you begin
- Plan a scheduled downtime for the upgrade. The process takes up to an hour.
- 
  
    In the Google Cloud console, activate Cloud Shell. 
Limitations
- After you create a Cloud Data Fusion instance, you cannot change its edition, even through an upgrade operation. 
- Don't trigger an upgrade with Terraform, as it deletes and recreates the instance, instead of performing an in-place upgrade. This issue results in the loss of any existing data within the instance. 
- Cloud Data Fusion doesn't restart pipelines that stop as a result of the upgrade operation. 
- When you upgrade an instance from versions prior to 6.11.0, expect greater downtime for the upgrade, especially if the instance handles a lot of data. 
- Upgrading real-time pipelines isn't supported, except in pipelines created in version 6.8.0 with a Kafka real-time source. For a workaround, see Upgrade real-time pipelines. 
Upgrade Cloud Data Fusion instances
To upgrade a Cloud Data Fusion instance to a new Cloud Data Fusion version, go to the Instance details page:
- In the Google Cloud console, go to the Cloud Data Fusion page. 
- Click Instances, and then click the instance's name to go to the Instance details page. 
Then perform the upgrade using either the Google Cloud console or gcloud CLI:
Console
- Click Upgrade for a list of available versions. 
- Select a version. 
- Click Upgrade. 
- Verify that the upgrade was successful: - Refresh the Instance details page. 
- Click View instance to access the upgraded instance in the Cloud Data Fusion web interface. 
- Click System admin in the menu bar. - The new version number appears at the top of the page. 
 
- To prevent your pipelines from getting stuck when you run them in the new version, grant the required roles in your upgraded instance. 
gcloud
- To upgrade to a new Cloud Data Fusion version, run the following gcloud CLI command from a local terminal Cloud Shell session: - gcloud beta data-fusion instances update INSTANCE_ID \ --project=PROJECT_ID \ --location=LOCATION_NAME \ --version=AVAILABLE_INSTANCE_VERSION- Optional: If applicable for your instance, add the - --enable_stackdriver_logging,- --enable_stackdriver_monitoring, and- --labelsflags.
- Optional: You can pass the CDAP properties, such as - enable.unrecoverable.reset, as- --options.
 
- Verify that the upgrade was successful by following these steps: - In the Google Cloud console, go to the Cloud Data Fusion Instances page. 
- Click View instance to access the upgraded instance in the Cloud Data Fusion web interface. 
- Click System Admin in the menu bar. - The new version number appears at the top of the page. 
 
- To prevent your pipelines from getting stuck when you run them in the new version, grant the required roles in your upgraded instance. 
Upgrade batch pipelines
To upgrade your Cloud Data Fusion batch pipelines to use the latest plugin versions:
- Recommended: Back up all pipelines. You can back up pipelines in one of two ways: - Download the zip file by following these steps: - To trigger a zip file download, back up all pipelines with the following command:
 - echo $CDAP_ENDPOINT/v3/export/apps- Copy the URL output to your browser.
- Extract the downloaded file, then confirm that all pipelines were exported. The pipelines are organized by namespace.
 
- Back up pipelines using Source Control Management (SCM), available in version 6.9 and later. SCM provides GitHub integration, which you can use to back up pipelines. 
 
- Upgrade pipelines by following these steps: - Create a variable that points to the - pipeline_upgrade.jsonfile that you will create in the next step to save a list of pipelines.- export PIPELINE_LIST=PATH/pipeline_upgrade.json- Replace PATH with the path to the file. 
- Create a list of all pipelines for an instance and namespace using the following command. The result is stored in the - $PIPELINE_LISTfile in- JSONformat. You can edit the list to remove pipelines that don't need upgrades.- curl -H "Authorization: Bearer $(gcloud auth print-access-token)" -H "Content-Type: application/json" ${CDAP_ENDPOINT}/v3/namespaces/NAMESPACE_ID/apps -o $PIPELINE_LIST- Replace NAMESPACE_ID with the namespace where you want the upgrade to happen. 
- Upgrade the pipelines listed in - pipeline_upgrade.json. Insert the NAMESPACE_ID of pipelines to be upgraded. The command displays a list of upgraded pipelines with their upgrade status.- curl -N -H "Authorization: Bearer $(gcloud auth print-access-token)" -H "Content-Type: application/json" ${CDAP_ENDPOINT}/v3/namespaces/NAMESPACE_ID/upgrade --data @$PIPELINE_LIST- Replace NAMESPACE_ID with the namespace ID of the pipelines that are getting upgraded. 
 
- To prevent your pipelines from getting stuck when you run them in the new version, grant the required roles in your upgraded instance. 
Upgrade real-time pipelines
Upgrading real-time pipelines is not supported, except in pipelines created in version 6.8.0 with a Kafka real-time source.
For everything else, you instead do the following:
- Stop and export the pipelines.
- Upgrade the instance.
- Import the real-time pipelines into your upgraded instance.
Upgrade to enable Replication
Replication can be enabled in Cloud Data Fusion environments in version 6.3.0 or later. If you have version 6.2.3, upgrade to 6.3.0, then upgrade to the latest version. You can then enable Replication.
Grant roles for upgraded instances
After the upgrade completes, grant the
Cloud Data Fusion Runner role
(roles/datafusion.runner) and
Cloud Storage Admin role
(roles/storage.admin) to the Dataproc service account in your
project.
What's next
- Manage patch revisions for Cloud Data Fusion instances.
- Learn about versioning in Cloud Data Fusion.
- Refer to the available version and patch revision upgrades.
- Troubleshoot upgrades.