This topic discusses steps you can take to troubleshoot and fix problems with the
  Cassandra datastore. Cassandra is a
  persistent datastore
  that runs in the cassandra component of the
  hybrid runtime architecture.
  See also
  Runtime service configuration overview.
Cassandra pods are stuck in the Pending state
Symptom
When starting up, the Cassandra pods remain in the Pending state.
Error message
  When you use kubectl to view the pod states, you see that one or more
  Cassandra pods are stuck in the Pending state. The
  Pending state indicates that Kubernetes is unable to schedule the pod
  on a node: the pod cannot be created. For example:
kubectl get pods -n namespace
NAME                                     READY   STATUS      RESTARTS   AGE
adah-resources-install-4762w             0/4     Completed   0          10m
apigee-cassandra-default-0               0/1     Pending     0          10m
...Possible causes
A pod stuck in the Pending state can have multiple causes. For example:
| Cause | Description | 
|---|---|
| Insufficient resources | There is not enough CPU or memory available to create the pod. | 
| Volume not created | The pod is waiting for the persistent volume to be created. | 
Diagnosis
Use kubectl
  to describe the pod to determine the source of the error. For example:
kubectl -n namespace describe pods pod_name
For example:
kubectl -n apigee describe pods apigee-cassandra-default-0
The output may show one of these possible problems:
- If the problem is insufficient resources, you will see a Warning message that indicates insufficient CPU or memory.
- If the error message indicates that the pod has unbound immediate PersistentVolumeClaims (PVC), it means the pod is not able to create its Persistent volume.
Resolution
Insufficient resources
Modify the Cassandra node pool so that it has sufficient CPU and memory resources. See Resizing a node pool for details.
Persistent volume not created
If you determine a persistent volume issue, describe the PersistentVolumeClaim (PVC) to determine why it is not being created:
- List the PVCs in the cluster:
kubectl -n namespace get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE cassandra-data-apigee-cassandra-default-0 Bound pvc-b247faae-0a2b-11ea-867b-42010a80006e 10Gi RWO standard 15m ... 
- Describe the PVC for the pod that is failing. For example, the following command
    describes the PVC bound to the pod apigee-cassandra-default-0:kubectl apigee describe pvc cassandra-data-apigee-cassandra-default-0 Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning ProvisioningFailed 3m (x143 over 5h) persistentvolume-controller storageclass.storage.k8s.io "apigee-sc" not found Note that in this example, the StorageClass named apigee-scdoes not exist. To resolve this problem, create the missing StorageClass in the cluster, as explained in Change the default StorageClass.
See also Debugging Pods.
Cassandra pods are stuck in the CrashLoopBackoff state
Symptom
When starting up, the Cassandra pods remain in the CrashLoopBackoff state.
Error message
  When you use kubectl to view the pod states, you see that one or more
  Cassandra pods are in the CrashLoopBackoff state.
  This state indicates that Kubernetes is unable to create the pod. For example:
kubectl get pods -n namespace
NAME                                     READY   STATUS            RESTARTS   AGE
adah-resources-install-4762w             0/4     Completed         0          10m
apigee-cassandra-default-0               0/1     CrashLoopBackoff  0          10m
...Possible causes
  A pod stuck in the CrashLoopBackoff state can have multiple causes. For example:
| Cause | Description | 
|---|---|
| Data center differs from previous data center | This error indicates that the Cassandra pod has a persistent volume that has data from a previous cluster, and the new pods are not able to join the old cluster. This usually happens when stale persistent volumes persist from the previous Cassandra cluster on the same Kubernetes node. This problem can occur if you delete and recreate Cassandra in the cluster. | 
| Truststore directory not found | This error indicates that the Cassandra pod is not able to create a TLS connection. This usually happens when the provided keys and certificates are invalid, missing, or have other issues. | 
Diagnosis
Check the Cassandra error log to determine the cause of the problem.
- List the pods to get the ID of the Cassandra pod that is failing:
  kubectl get pods -n namespace 
- Check the failing pod's log:
  kubectl logs pod_id -n namespace 
Resolution
Look for the following clues in the pod's log:
Data center differs from previous data center
If you see this log message:
Cannot start node if snitch's data center (us-east1) differs from previous data center
- Check if there are any stale or old PVC in the cluster and delete them.
- If this is a fresh install, delete all the PVCs and re-try the setup. For example:
kubectl -n namespace get pvckubectl -n namespace delete pvc cassandra-data-apigee-cassandra-default-0
Truststore directory not found
If you see this log message:
Caused by: java.io.FileNotFoundException: /apigee/cassandra/ssl/truststore.p12 (No such file or directory)
Verify the key and certificates if provided in your overrides file are correct and valid. For example:
cassandra: sslRootCAPath: path_to_root_ca-file sslCertPath: path-to-tls-cert-file sslKeyPath: path-to-tls-key-file
Node failure
Symptom
When starting up, the Cassandra pods remain in the Pending state. This problem can indicate an underlying node failure.
Diagnosis
- Determine which Cassandra pods are not running:
    $ kubectl get pods -n your_namespace NAME READY STATUS RESTARTS AGE cassandra-default-0 0/1 Pending 0 13s cassandra-default-1 1/1 Running 0 8d cassandra-default-2 1/1 Running 0 8d 
- Check the worker nodes. If one is in the NotReady state, then
    that is the node that has failed:
    kubectl get nodes -n your_namespace NAME STATUS ROLES AGE VERSION INTERNAL-IP gke-hybrid-cluster-apigee-data-178811f1-lv5j Ready <none> 34d v1.21.5-gke.1302 10.138.15.198 gke-hybrid-cluster-apigee-data-d63b8b8d-n41g NotReady <none> 34d v1.21.5-gke.1302 10.138.15.200 gke-hybrid-cluster-apigee-data-ec752c0b-b1cr Ready <none> 34d v1.21.5-gke.1302 10.138.15.199 gke-hybrid-cluster-apigee-runtime-ba502ff4-57mq Ready <none> 34d v1.21.5-gke.1302 10.138.15.204 gke-hybrid-cluster-apigee-runtime-ba502ff4-hwkb Ready <none> 34d v1.21.5-gke.1302 10.138.15.203 gke-hybrid-cluster-apigee-runtime-bfa558e0-08vw Ready <none> 34d v1.21.5-gke.1302 10.138.15.201 gke-hybrid-cluster-apigee-runtime-bfa558e0-xvsc Ready <none> 34d v1.21.5-gke.1302 10.138.15.202 gke-hybrid-cluster-apigee-runtime-d12de7df-693w Ready <none> 34d v1.21.5-gke.1302 10.138.15.241 gke-hybrid-cluster-apigee-runtime-d12de7df-fn0w Ready <none> 34d v1.21.5-gke.1302 10.138.15.206 
Resolution
- Remove the dead Cassandra pod from the cluster.
    $ kubectl exec -it apigee-cassandra-default-0 -- nodetool status $ kubectl exec -it apigee-cassandra-default-0 -- nodetool removenode deadnode_hostID
- Remove the VolumeClaim from the dead node to prevent the
    Cassandra pod from attempting to come up on the dead node because
    of the affinity:
    kubectl get pvc -n your_namespace kubectl delete pvc volumeClaim_name -n your_namespace
- Update the volume template and create PersistentVolume for the
    newly added node. The following is an example volume template:
    apiVersion: v1 kind: PersistentVolume metadata: name: cassandra-data-3 spec: capacity: storage: 100Gi accessModes: - ReadWriteOnce persistentVolumeReclaimPolicy: Retain storageClassName: local-storage local: path: /apigee/data nodeAffinity: "required": "nodeSelectorTerms": - "matchExpressions": - "key": "kubernetes.io/hostname" "operator": "In" "values": ["gke-hybrid-cluster-apigee-data-d63b8b8d-n41g"] 
- Replace the values with the new hostname/IP and apply the template:
    kubectl apply -f volume-template.yaml 
Create a client container for debugging
  This section explains how to create a client container from which you can access
 Cassandra debugging utilities
  such as cqlsh. These utilities allow you to query Cassandra tables and
  can be useful for debugging purposes.
Create the client container
To create the client container, follow these steps:
- The container uses the TLS certificate from the apigee-cassandra-user-setuppod. The first step is to fetch this certificate name:kubectl get secrets -n apigee --field-selector type=kubernetes.io/tls | grep apigee-cassandra-user-setup | awk '{print $1}'This command returns the certificate name. For example: apigee-cassandra-user-setup-rg-hybrid-b7d3b9c-tls.
- Open a new file and paste the following pod spec into it:
apiVersion: v1 kind: Pod metadata: labels: name: cassandra-client-name # For example: my-cassandra-client namespace: apigee spec: containers: - name: cassandra-client-name image: "gcr.io/apigee-release/hybrid/apigee-hybrid-cassandra-client:1.6.9" imagePullPolicy: Always command: - sleep - "3600" env: - name: CASSANDRA_SEEDS value: apigee-cassandra-default.apigee.svc.cluster.local - name: APIGEE_DML_USER valueFrom: secretKeyRef: key: dml.user name: apigee-datastore-default-creds - name: APIGEE_DML_PASSWORD valueFrom: secretKeyRef: key: dml.password name: apigee-datastore-default-creds volumeMounts: - mountPath: /opt/apigee/ssl name: tls-volume readOnly: true volumes: - name: tls-volume secret: defaultMode: 420 secretName: your-secret-name # For example: apigee-cassandra-user-setup-rg-hybrid-b7d3b9c-tls restartPolicy: Never 
- Save the file with a .yamlextension. For example:my-spec.yaml.
- Apply the spec to your cluster:
    kubectl apply -f your-spec-file.yaml -n apigee 
- Log in to the container:
  kubectl exec -n apigee cassandra-client -it -- bash 
- Connect to the Cassandra cqlshinterface with the following command. Enter the command exactly as shown:cqlsh ${CASSANDRA_SEEDS} -u ${APIGEE_DML_USER} -p ${APIGEE_DML_PASSWORD} --ssl
Deleting the client pod
Use this command to delete the Cassandra client pod:
kubectl delete pods -n apigee cassandra-client
Additional resources
See Introduction to Apigee and Apigee hybrid playbooks.