Migrate environments to Managed Airflow (Gen 3) (migration script)

Managed Airflow (Gen 3) | Managed Airflow (Gen 2) | Managed Airflow (Legacy Gen 1)

This page explains how to migrate DAGs, data and configuration from your existing Managed Airflow (Gen 2) environment to a new Managed Airflow (Gen 3) environment using the migration script.

From To Method Guide
Managed Airflow (Gen 3), Airflow 2 Managed Airflow (Gen 3), Airflow 3 Side-by-side, manual transfer Manual migration guide
Managed Airflow (Gen 2) Managed Airflow (Gen 3) Side-by-side, using the migration script This guide
Managed Airflow (Gen 2) Managed Airflow (Gen 3) Side-by-side, using snapshots Snapshots migration guide
Managed Airflow (Legacy Gen 1), Airflow 2 Managed Airflow (Gen 3) Side-by-side, using snapshots Snapshots migration guide
Managed Airflow (Legacy Gen 1), Airflow 2 Managed Airflow (Gen 2) Side-by-side, using snapshots Snapshots migration guide
Managed Airflow (Legacy Gen 1), Airflow 2 Managed Airflow (Gen 2) Side-by-side, manual transfer Manual migration guide
Managed Airflow (Legacy Gen 1), Airflow 1 Managed Airflow (Gen 2), Airflow 2 Side-by-side, using snapshots Snapshots migration guide
Managed Airflow (Legacy Gen 1), Airflow 1 Managed Airflow (Gen 2), Airflow 2 Side-by-side, manual transfer Manual migration guide
Managed Airflow (Legacy Gen 1), Airflow 1 Managed Airflow (Legacy Gen 1), Airflow 2 Side-by-side, manual transfer Manual migration guide

About the migration script

The migration script is a Python script for side-by-side migrations that automates the migration process from Managed Airflow (Gen 2) to Managed Airflow (Gen 3). It uses environment snapshots to transfer the environment's configuration to the new environment.

The script performs the following actions:

  1. Obtains the configuration of the Managed Airflow (Gen 2) environment.

  2. Creates a Managed Airflow (Gen 3) environment with configuration that matches the obtained configuration.

    Because Managed Airflow (Gen 3) environments have a different architecture, some parameters might be adjusted to match the differences. You can also adjust most of the environment's parameters later.

  3. Pauses all DAGs in the Managed Airflow (Gen 2) environment one by one. only dags that were unpaused in the Managed Airflow (Gen 2) environment will be unpaused later.

  4. Saves a snapshot of the source Managed Airflow (Gen 2) environment. The snapshot is saved in the default location for snapshots, the Managed Airflow (Gen 2) environment's bucket.

  5. Loads the snapshot to the Managed Airflow (Gen 3) environment.

    The script doesn't check the compatibility of custom PyPI packages, environment variables, and Airflow configuration option overrides with the Managed Airflow (Gen 3) environment.

    In case of conflicts, the migration fails after the Managed Airflow (Gen 3) environment is created, during the process of loading the snapshot. In this case, you can either adjust the configuration of your Managed Airflow (Gen 2) environment to resolve the conflict, or migrate without the migration script and skip loading custom PyPI packages, environment variables, or Airflow configuration overrides when you load the snapshot.

  6. Unpauses the DAGs in the Managed Airflow (Gen 3) environment. If some DAGs were already paused before you ran the script, they will remain paused.

The script has the following limitations:

  • The script always creates a new Managed Airflow (Gen 3) environment. It's not possible to load the snapshot to an existing Managed Airflow (Gen 3) environment. To do so, you can migrate using snapshots, without using the migration script.

  • The script creates a Managed Airflow (Gen 3) environment only in the same region and project as the Managed Airflow (Gen 2) environment.

  • You can only load snapshots to the same or later version of Airflow. For example, you can't load a snapshot from Airflow 2.10.2 to Airflow 2.9.3.

  • Only Managed Airflow (Gen 2) environments can be migrated with the migration script.

Before you begin

  • Because the migration script creates an environment, and then saves and loads a snapshot, the migration process can take over one hour of time.

  • The script uses snapshots. Snapshots are supported

  • in Managed Airflow (Gen 2) version 2.0.9 and later versions.

  • Your account requires an IAM role that can create environments, save snapshots, and load snapshots.

  • The maximum size of the Airflow database that supports snapshots is 20 GB. If your environment's database takes more than 20 GB, reduce the size of the Airflow database.

  • The total number of objects in the /dags, /plugins and /data folders in the environment's bucket must be less than 100,000 to create snapshots.

  • If you use the XCom mechanism to transfer files, make sure that you use it according to Airflow's guidelines. Transferring big files or a large number of files using XCom impacts Airflow database's performance and can lead to failures when loading snapshots or upgrading your environment. Consider using alternatives such as Cloud Storage to transfer large volumes of data.

Migrate to Managed Airflow (Gen 3)

This section describes the migration process using the migration script.

Check the differences between Managed Airflow (Gen 2) and Managed Airflow (Gen 3)

Check the list of differences between Managed Airflow (Gen 2) and Managed Airflow (Gen 3).

Make sure that your environment doesn't use features that aren't yet available in Managed Airflow (Gen 3) and that you are familiar with how to use and configure features specific to Managed Airflow (Gen 3).

Make sure that your DAGs are compatible with Managed Airflow (Gen 3)

Make sure that your DAGs are compatible with Managed Airflow (Gen 3) by following these suggestions:

  • The list of packages in the Managed Airflow (Gen 3) environment can be different than in your Managed Airflow (Gen 2) environment. This might affect the compatibility of your DAGs with Managed Airflow (Gen 3).

  • In Managed Airflow (Gen 3), the environment's cluster is located in the tenant project. Make sure that your DAGs are compatible with this change. In particular, KubernetesPodOperator workloads now scale independently from your environment and it's not possible to use Pod affinity configs.

Check for configuration compatibility

You can do an upgrade check to see if your Managed Airflow (Gen 2) environment's configuration is compatible with Managed Airflow (Gen 3). We recommend to resolve all blocking conflicts that are reported by this check before starting the migration.

Install script's dependencies

  • The script requires Python version 3.8 and later versions.

  • The migration script uses gcloud CLI and curl utilities. Make sure that both utilities are installed on your computer.

Download the script

Download the migration script (composer_migrate.py) from its repository on GitHub.

Authorize in gcloud CLI

Authorize in gcloud CLI:

gcloud auth login

Preview the new environment's parameters

You can preview the Managed Airflow (Gen 3) environment's parameters before migrating. You can use this to see how the Managed Airflow (Gen 2) environment's configuration corresponds to Managed Airflow (Gen 3).

Airflow configuration option overrides, custom PyPI packages, and environment variables are loaded from the environment's snapshot and are not displayed in the preview.

Expand

Run the following command:

python3 composer_migrate.py \
    --project PROJECT_ID \
    --location LOCATION \
    --source_environment COMPOSER_2_ENV \
    --target_environment COMPOSER_3_ENV \
    --target_airflow_version COMPOSER_3_AIRFLOW_VERSION \
    --dry_run

Replace the following:

  • PROJECT_ID: the Project ID.
  • COMPOSER_2_ENV: the name of your Managed Airflow (Gen 2) environment.
  • LOCATION: the region where the Managed Airflow (Gen 2) environment is located. The Managed Airflow (Gen 3) environment will be created in the same region.
  • COMPOSER_3_AIRFLOW_VERSION: the version of Airflow of the Managed Airflow (Gen 3) environment. This version must be the same or later version than in the Managed Airflow (Gen 2) environment and must be one of the versions available in Managed Airflow (Gen 3).

Example:

python3 composer_migrate.py \
    --project example-project \
    --location us-central1 \
    --source_environment example-composer-2-environment \
    --target_environment example-composer-3-environment \
    --target_airflow_version 2.10.2

Check environment's health

Make sure that your Managed Airflow (Gen 2) environment that you want to migrate is healthy.

If your environment isn't healthy, the migration process will fail after creating a new Managed Airflow (Gen 3) environment because it won't be possible to create a snapshot.

See Use the monitoring dashboard for more information about ways to check environment's health and database health.

Run the migration script

Run the following command:

python3 composer_migrate.py \
    --project PROJECT_ID \
    --location LOCATION \
    --source_environment COMPOSER_2_ENV \
    --target_environment COMPOSER_3_ENV \
    --target_airflow_version COMPOSER_3_AIRFLOW_VERSION

Replace the following:

  • PROJECT_ID: the Project ID.
  • COMPOSER_2_ENV: the name of your Managed Airflow (Gen 2) environment.
  • LOCATION: the region where the Managed Airflow (Gen 2) environment is located. The Managed Airflow (Gen 3) environment will be created in the same region.
  • COMPOSER_3_AIRFLOW_VERSION: the version of Airflow of the Managed Airflow (Gen 3) environment. This version must be the same or later version than in the Managed Airflow (Gen 2) environment and must be one of the versions available in Managed Airflow (Gen 3).

Check for DAG errors

  1. In the Airflow web interface, go to DAGs and check for reported DAG syntax errors.

  2. Check that DAG runs are scheduled at the correct time.

  3. Wait for the DAG runs to happen in the Managed Airflow (Gen 3) environment and check if they were successful. If a DAG run was successful, don't unpause it in the Managed Airflow (Gen 2) environment; if you do so, a DAG run for the same time and date happens in your Managed Airflow (Gen 2) environment.

  4. If a specific DAG runs fails, attempt to troubleshoot the DAG until it successfully runs in Managed Airflow (Gen 3).

Monitor your Managed Airflow (Gen 3) environment

Monitor your Managed Airflow (Gen 3) environment for potential issues, failed DAG runs, and overall environment health.

If the Managed Airflow (Gen 3) environment runs without problems for a sufficient period of time, consider deleting the Managed Airflow (Gen 2) environment.

What's next