This guide provides steps to diagnose and resolve common issues with App Lifecycle Manager feature flags.
SDK and Runtime Evaluation
The OpenFeature API is designed for maximum safety and will never throw an error that crashes your application.
Symptom: Flag returning safe default value
If the flag service is unreachable, the provider is unavailable, or the flag is misconfigured, the SDK returns the required default value provided in your code.
Resolution:
- Verify the connection between your OpenFeature provider and the
saasconfig.googleapis.comendpoint. - Check for authentication errors. Ensure the service account has
roles/saasconfig.viewer. - Verify
FLAGD_SOURCE_PROVIDER_IDis correct.
Symptom: Unexpected evaluation results
If a flag is not returning the expected variant:
Resolution:
- Check your CEL conditions and evaluation context.
- If a required attribute is missing from the context during a condition check, the evaluation engine skips that specific rule.
- If an attribute is missing during a percentage-based allocation,
the evaluation returns
INVALID_CONTEXT.
Monitor Rollouts
Monitoring a feature flag rollout involves verifying both the orchestration state (did the rollout complete?) and the application health (did the binary pick up the flag?).
Rollout orchestration state
The system monitors the success rate of operations in real-time.
Automatic pausing
To prevent a "broken" configuration from reaching your entire fleet, the system automatically pauses a rollout if the failure rate exceeds the Error Budget (default 10% per location).
Check summary stats
Use the describe command to see the aggregate failure count and determine if
the rollout is PAUSED or FAILED.
gcloud beta app-lifecycle-manager rollouts describe ROLLOUT_ID --location=global
Filter failed unit updates
Once you confirm failures exist, you must find the specific units that missed the update. Use this filter to isolate only the failed flag updates in a specific region:
gcloud beta app-lifecycle-manager unit-operations list \
--location=LOCATION \
--filter="state:UNIT_OPERATION_STATE_FAILED AND flag_update:*"
Diagnose the Root Cause
For each failed operation, you can retrieve a detailed error message provided by the actuation engine. Read the state_message field.
gcloud beta app-lifecycle-manager unit-operations describe OPERATION_ID --location=LOCATION
Unit Missing Configs
Understanding Unit Pinning
Pinning freezes a unit at its current release, preventing it from being updated by any automated or manual rollouts. A pinned unit is explicitly excluded from rollout operations until its pinning period expires or it is manually unpinned.
Missing Config from Pinned Units
A critical scenario occurs when a unit is pinned during a feature flag rollout. Pinning blocks all updates, meaning the unit skips the FlagUpdate operation required to receive its configuration.
Symptom: After the unit is unpinned and upgraded to a binary that requires feature flags, the application fails to initialize or enters an infinite retry loop, leading to failed readiness probes and continuous container restarts.
Root Cause: The UnitKind's default_flag_revisions were not updated while the unit was pinned, leaving the unit with no active configuration.
Diagnostic Commands
1. List Failed Flag Operations
To find units that missed an update, filter the unit operations for the UNIT_OPERATION_STATE_FAILED state and the flag_update type.
gcloud beta app-lifecycle-manager unit-operations list \
--location="LOCATION" \
--filter="state:UNIT_OPERATION_STATE_FAILED AND flag_update:*"
2. Filter Specifically for Failed Flag Rollouts
To pinpoint failed flag updates across your fleet or for a specific unit, add a condition for the UNIT_OPERATION_STATE_FAILED state.
# Find all failed operations for a specific unit
gcloud beta app-lifecycle-manager unit-operations list \
--location="LOCATION" \
--filter="unit:UNIT_ID AND state:UNIT_OPERATION_STATE_FAILED"
3. Inspect the Failure Reason
Use the describe command to read the state_message.
gcloud beta app-lifecycle-manager unit-operations describe "OPERATION_ID" --location="LOCATION"
4. Check Unit Condition
You can also check the current state of a unit directly. If a flag rollout is currently failing or the last operation resulted in an error, it will be reflected in the status. Conditions with a type of operationError.
gcloud beta app-lifecycle-manager units describe "UNIT_ID" --location="LOCATION"
Manual Mitigation
Applying UnitKind Default Flag Revisions
If a unit is missing its configuration because it was pinned during one or more rollouts, you can recover it by manually resetting the unit to the current baseline defined in its UnitKind.
- Unpin the unit (if pinned).
- Retrieve default revisions from parent UnitKind.
- Create a new FlagRelease containing those revisions.
- Manually initiate a FlagUpdate unit operation to apply the recovery release.
# 1. Unpin
gcloud beta app-lifecycle-manager units update "UNIT_ID" \
--location="LOCATION" \
--maintenance-pinned-until-time=""
# 2. Retrieve default revisions
gcloud beta app-lifecycle-manager unit-kinds describe "UNIT_KIND_ID" \
--location="global" \
--format="value(defaultFlagRevisions)"
# 3. Create recovery release
gcloud beta app-lifecycle-manager flags releases create "recovery-release-1" \
--location="global" \
--unit-kind="UNIT_KIND_ID" \
--revisions="REVISIONS"
# 4. Initiate manual update
gcloud beta app-lifecycle-manager unit-operations create "recovery-op" \
--unit="UNIT_ID" \
--flag-release="recovery-release-1" \
--location="global"