Symptom
When starting up, the telemetry pods go in and out of CrashLoopBackoff state. This can cause periodic gaps in your metrics or graphs as the pods restart. You could also see discrepancies with analytics data as some sections of data are missing.
Error messages
When you use kubectl to view the pod states, you will see one or more metric pods in the CrashLoopBackoff state. Refer to the following command:
kubectl get pods -n APIGEE_NAMESPACE
Where APIGEE_NAMESPACE is the Kubernetes namespace for your Apigee hybrid components. For more information, see Create the apigee namespace.
Sample Output
NAME READY STATUS RESTARTS AGE apigee-metrics-default-telemetry-proxy-1104-hvwoo-zlmlw 0/1 CrashLoopBackoff 10 10m apigee-metrics-adapter-apigee-telemetry-1104-7fyff-tts65 0/1 CrashLoopBackoff 10 10m apigee-metrics-default-telemetry-proxy-1104-hvwoo-zlmlw 0/1 FailedScheduling 0 12m
Common diagnosis steps
- Check the events for issues with telemetry pods with the following command:
kubectl -n apigee get event
Sample Output
LAST SEEN TYPE REASON OBJECT MESSAGE 53m Normal SuccessfulCreate job/apigee-cassandra-schema-val-jghunt-20250709-0820206-29251940 Created pod: apigee-cassandra-schema-val-jghunt-20250709-0820206-292519fkt7j 53m Normal Completed job/apigee-cassandra-schema-val-jghunt-20250709-0820206-29251940 Job completed 43m Normal SuccessfulCreate job/apigee-cassandra-schema-val-jghunt-20250709-0820206-29251950 Created pod: apigee-cassandra-schema-val-jghunt-20250709-0820206-292519l87m8 43m Normal Completed job/apigee-cassandra-schema-val-jghunt-20250709-0820206-29251950 Job completed 33m Normal SuccessfulCreate job/apigee-cassandra-schema-val-jghunt-20250709-0820206-29251960 Created pod: apigee-cassandra-schema-val-jghunt-20250709-0820206-29251962ncc - You can also check the events of telemetry pods with a
CrashLoopBackOffstate using the following command:kubectl -n apigee describe POD_NAME
Where POD_NAME is the name of the pod that is in a
CrashLoopBackOffstate.Sample Output
apigee-metrics-apigee-telemetry-app-1101-qc36n-dxzrv
- You can also check the
cpustatus of the pods with the following command:kubectl -n apigee get hpa | grep unknown
Sample Output
apigee-metrics-apigee-telemetry-app-1101-qc36n-dxzrv ReplicaSet/apigee-metrics-apigee-telemetry-app-1101-qc36n-dxzrv
/80% 2 10 2 8h
Possible causes
| Cause | Description | Troubleshooting instructions applicable for |
|---|---|---|
metrics.app.resources.requests.cpu and metrics.app.resources.limits.cpu are missing |
The cpu must be specified in the overrides.yaml file. |
Apigee hybrid |
Cause
cpu is not mentioned in the overrides.yaml file, so cpu gets an undefined value.
Diagnosis
Check your
overrides.yaml file to see if both cpu values are defined for metrics.app.resources.requests.cpu and metrics.app.resources.limits.cpu.
Resolution
If cpu settings are missing in your
overrides.yaml file for metrics, provide both cpu values in the overrides.yaml file.
Add the following configuration under the
metricssection in youroverrides.yamlfile:metrics: app: # The apigee-prometheus-app container in the "app" pod resources: requests: memory: 512Mi # Default value: 512Mi cpu: 500m # Default value: 500m limits: memory: 2Gi # default: 1Gi cpu: 500m # Default value: 500m
- Apply changes using the following command:
helm upgrade ENV_RELEASE_NAME apigee-env/ \ --install \ --namespace APIGEE_NAMESPACE \ --set env=ENV_NAME \ -f OVERRIDES_FILE
Where ENV_RELEASE_NAME is a unique name used to track installation and upgrade of the
apigee-envchart. While it's typically the same as the ENV_NAME, it must be different if your environment has the same name as your environment group. For example, if both are nameddev, you would usedev-env-releaseanddev-envgroup-releaseto distinguish them.Where APIGEE_NAMESPACE is the Kubernetes namespace for your Apigee hybrid components. For more information, see Create the apigee namespace.
Where ENV_NAME is the name you used when you created the environment in the UI.
Where OVERRIDES_FILE is the
overrides.yamlfile that is used during upgrades or install.
Must gather diagnostic information
If the problem persists even after following the above instructions, gather the following diagnostic information and then contact Google Cloud Customer Care:
- The
overrides.yamlfile. - The output from the Apigee hybrid must-gather script.