Cloud Composer 3 | Cloud Composer 2 | Cloud Composer 1
This page provides troubleshooting information for problems that you might encounter while updating or upgrading Cloud Composer environments.
For troubleshooting information related to creating environments, see Troubleshooting environment creation.
When Cloud Composer environments are updated, the majority of issues happen because of the following reasons:
- Service account permission problems
- PyPI dependency issues
- Size of the Airflow database
Insufficient permissions to update or upgrade an environment
If Cloud Composer can't update or upgrade an environment because of insufficient permissions, it outputs the following error message:
ERROR: (gcloud.composer.environments.update) PERMISSION_DENIED: The caller does not have permission
Solution: Assign roles to both to your account and to the service account of your environment as described in Access control.
The service account of the environment has insufficient permissions
When creating a Cloud Composer environment, you specify a service account that performs most of the environment's operations. If this service account doesn't have enough permissions for the requested operation, then Cloud Composer outputs an error:
    UPDATE operation on this environment failed 3 minutes ago with the
    following error message:
    Composer Backend timed out. Currently running tasks are [stage:
    CP_COMPOSER_AGENT_RUNNING
    description: "No agent response published."
    response_timestamp {
      seconds: 1618203503
      nanos: 291000000
    }
    ].
Solution: Assign roles to your Google Account and to the service account of your environment as described in Access control.
The size of the Airflow database is too big to perform the operation
An upgrade operation might not succeed because the size of the Airflow database is too large for upgrade operations to succeed.
If the size of the Airflow database is more than 16 GB, Cloud Composer outputs the following error:
Airflow database uses more than 16 GB. Please clean the database before upgrading.
Solution: Perform the Airflow database cleanup, as described in Clean up the Airflow database.
An upgrade to a new Cloud Composer version fails because of PyPI package conflicts
When you upgrade an environment with installed custom PyPI packages, you might encounter errors related to PyPI package conflicts. This might happen because the new Cloud Composer image contains later versions of preinstalled packages. This can cause dependency conflicts with PyPI packages that you installed in your environment.
Solution:
- To get detailed information about package conflicts, run an upgrade check.
- Loosen version constraints for installed custom PyPI packages. For example,
instead of specifying a version as ==1.0.1, specify it as>=1.0.1.
- For more information about changing version requirements to resolve conflicting dependencies, see pip documentation.
It's not possible to upgrade an environment to a version that is still supported
Cloud Composer environments can be upgraded only to several latest and previous versions.
The version limitations for creating new environments and upgrading existing environments are different. The Cloud Composer version you choose when creating a new environment might not be available when upgrading existing environments.
You can perform the upgrade operation using Google Cloud CLI, API or Terraform. In Google Cloud console, only the latest versions are available as upgrade choices.
Environment is not healthy (liveness check failed)
It is possible to upgrade an environment only if its status is reported as healthy.
One of the most common causes for non-healthy status is when environment's components approach the configured resource limits and constantly operate at maximum load. Because some environment components can't report their status, the liveness check DAG reports the status of the environment as not healthy.
To solve this problem, we recommend to increase the resource limits. Although we recommend to keep your environment from approaching the limits at all times, you can also do so only for the period when your environment upgrades.
Lack of connectivity to DNS can cause problems while performing upgrades or updates
Such connectivity problems might result in the log entries like this:
WARNING - Compute Engine Metadata server unavailable attempt 1 of 5. Reason: [Errno -3] Temporary failure in name resolution Error
It usually means that there is no route to DNS so make sure that metadata.google.internal DNS name can be resolved to IP address from within Cluster, Pods and Services networks. Check if you have Private Google Access turned on within VPC (in host or service project) where your environment is created.
Triggerer CPU exceeds the 1 vCPU limit
Cloud Composer versions 2.4.4 and later introduce a different triggerer resource allocation strategy to improve performance scaling. If you encounter an error related to triggerer CPU when performing an environment update, it means that your current triggerers are configured to use more than 1 vCPU per triggerer.
Solution:
- Adjust triggerer resource allocation to meet the 1 vCPU limit.
- If you anticipate issues with DAGs that use deferrable operators, then we recommend to also increase the number of triggerers.
Inspect failed migration warnings
When upgrading Airflow to a later version, sometimes new constraints are applied to the Airflow database. If these constraints can't be applied, Airflow creates new tables to store the rows for which the constraints couldn't be applied. Airflow UI displays a warning message until the moved data tables are renamed or dropped.
Solution:
You can use the following two DAGs to inspect the moved data and rename the tables.
The list_moved_tables_after_upgrade_dag DAG lists rows that were moved from
every table where constraints could not be applied. Inspect the data and decide
whether you want to keep it. To keep it, you need to manually fix the data in
the Airflow database. For example, by adding the rows back with the correct data.
If you don't need the data or if you already fixed it, then you can run the
rename_moved_tables_after_upgrade_dag DAG. This DAG renames the moved tables.
The tables and their data are not deleted, so you can review the data at a
later point.
Environment operation stays in the failed state indefinitely
Cloud Composer 2 environments rely on Pub/Sub topics and subscriptions to communicate with resources located in the tenant project of your environment during environment operations.
If the Pub/Sub API is disabled in your project, or if environment's topics or subscriptions are deleted, then an environment operation might fail and stay in the failed state indefinitely. Such an environment becomes irrevocably broken.