Troubleshoot Config Connector

This page describes troubleshooting techniques that you can use to troubleshoot Config Connector and common issues that you might encounter when using the product.

Check Config Connector status and conditions

Check the version of Config Connector

Run the following command to get the installed Config Connector version, and cross-reference the release notes to verify that the running version supports the features and resources that you want to use:

kubectl get ns cnrm-system -o jsonpath='{.metadata.annotations.cnrm\.cloud\.google\.com/version}'

Check the resource's status and events

Usually, you can determine the issue with your Config Connector resources by inspecting the state of your resources in Kubernetes. Checking a resource's status and events is particularly helpful for determining if Config Connector failed to reconcile the resource and why the reconciliation failed.

Check that Config Connector is running

To check that Config Connector is running, verify that all of its Pods are READY:

kubectl get pod -n cnrm-system

Example output:

NAME                                            READY   STATUS    RESTARTS   AGE
cnrm-controller-manager-0                       1/1     Running   0          1h
cnrm-deletiondefender-0                         1/1     Running   0          1h
cnrm-resource-stats-recorder-77dc8cc4b6-mgpgp   1/1     Running   0          1h
cnrm-webhook-manager-58496b66f9-pqwhz           1/1     Running   0          1h
cnrm-webhook-manager-58496b66f9-wdcn4           1/1     Running   0          1h

If you have Config Connector installed in namespaced-mode, then you will have one controller (cnrm-controller-manager) Pod for each namespace that is responsible for managing the Config Connector resources in that namespace.

You can check the status of the controller Pod responsible for a specific namespace by running:

kubectl get pod -n cnrm-system \
    -l cnrm.cloud.google.com/scoped-namespace=NAMESPACE \
    -l cnrm.cloud.google.com/component=cnrm-controller-manager

Replace NAMESPACE with the name of the namespace.

Check the controller logs

The controller Pod logs information and errors related to the reconciliation of Config Connector resources.

You can check the controller Pod's logs by running:

kubectl logs -n cnrm-system \
    -l cnrm.cloud.google.com/component=cnrm-controller-manager \
    -c manager

If you have Config Connector installed in namespaced-mode, then the previous command shows the logs of all controller Pods combined. You can check the logs of the controller Pod for a specific namespace by running:

kubectl logs -n cnrm-system \
    -l cnrm.cloud.google.com/scoped-namespace=NAMESPACE \
    -l cnrm.cloud.google.com/component=cnrm-controller-manager \
    -c manager

Replace NAMESPACE with the name of the namespace.

Abandon and acquire the resource

In some cases, you might need to update an immutable field in a resource. Since you can't edit immutable fields, you must abandon and then acquire the resource:

Update the YAML configuration of the Config Connector resource and set the cnrm.cloud.google.com/deletion-policy annotation to abandon.
Apply the updated YAML configuration to update the Config Connector resource's deletion policy.
Abandon the Config Connector resource.
Update the immutable fields that need to be changed in the YAML configuration.
Apply the updated YAML configuration to acquire the abandoned resource.

Troubleshoot by issue type

Use the following table to help troubleshoot your issue based on the type of symptom.

Type of issue	Common issues
Reconciliation	Resource keeps updating every 5-15 mins Resource has no status KNV2005: syncer excessively updating resource Resource deleted by Config Connector Controller Pod OOMKilled
Deletion	Deletions of namespaces stuck at `Terminating` Deletions of resources stuck at `DeleteFailed` after project was deleted
Permissions	Compute Engine Metadata not defined Error 403: Request had insufficient authentication scopes 403 Forbidden: The caller does not have permission Error 403: Caller is missing IAM permission Update error with IAMPolicy, IAMPartialPolicy and IAMPolicyMember
Installation and upgrades	Version not supported in Config Connector add-on installations `failed calling webhook` `PodSecurityPolicy` prevents upgrades
Configuration	Cannot make changes to immutable fields No matches for kind "Foo" Labels are not propagated to the Google Cloud resource Error due to special characters in resource name Unable to remove fields from resource spec

Reconciliation

The following section lists common issues related to reconciliation of resources by Config Connector.

Resource keeps updating every 5-15 mins

Symptom

Your Config Connector resource keeps switching from an UpToDate status to an Updating status every 5-10 minutes.

Cause

It is likely that Config Connector is detecting unintentional diffs between the resource's desired state and actual state, thereby causing Config Connector to constantly update the resource.

Resolution

First, confirm that you do not have any external systems that are constantly modifying either the Config Connector or Google Cloud resource (for example, CI/CD pipelines, custom controllers, cron jobs, etc.).

If the behavior is not due to an external system, see if Google Cloud is changing any of the values specified in your Config Connector resource. For example, in some cases, Google Cloud changes the formatting (for example, capitalization) of field values which leads to a diff between your resource's desired state and actual state.

Get the state of the Google Cloud resource using the REST API (for example, for ContainerCluster) or the Google Cloud CLI. Then, compare that state against your Config Connector resource. Look for any fields whose values do not match, then update your Config Connector resource to match. In particular, look for any values that were reformatted by Google Cloud. For example, see GitHub issues #578 and #294.

Note that this is not a perfect method since the Config Connector and Google Cloud resource models are different, but it should let you catch most cases of unintended diffs.

If you are unable to resolve your issue, see Additional help.

Resource has no status

Symptom

Your resources don't have a status field.

Cause

It is likely that Config Connector is not running properly.

Resolution

Check that Config Connector is running.

KNV2005: syncer excessively updating resource

Symptom

You use Config Sync and are seeing KNV2005 errors for Config Connector resources, similar to the following:

KNV2005: detected excessive object updates, approximately 6 times per
minute. This may indicate Config Sync is fighting with another controller over
the object.

Cause

It is likely that Config Sync and Config Connector are fighting over the resource.

Config Sync and Config Connector are said to be "fighting" over a resource if they keep updating the same field(s) to different values. One's update triggers the other to act and update the resource, which causes the other to act and update the resource, and this repeats without ending.

Fighting is not a problem for most fields. Fields that are specified in Config Sync are not changed by Config Connector. Similarly, fields that are not specified in Config Sync and defaulted by Config Connector are ignored by Config Sync. Therefore, for most fields, Config Sync and Config Connector shouldn't need to update the same field.

One exception is list fields. Similar to how Config Connector may default subfields in object fields, Config Connector may also default subfields in objects inside lists. However, since list fields in Config Connector resources are atomic, the defaulting of subfields is considered as changing the value of the list entirely.

Therefore, Config Sync and Config Connector will "fight" over a resource if Config Sync sets a list field and Config Connector defaults any subfields within that list.

Resolution

To work around this issue, you have the following options:

Update the resource manifest in the Config Sync repository to match what Config Connector is trying to set the resource to.

One way to do this is to temporarily stop syncing configs, wait for Config Connector to finish reconciling the resource, and then update your resource manifest to match the resource on the Kubernetes API Server.
Stop Config Sync from reacting to updates to the resource on the Kubernetes API Server by setting the annotation client.lifecycle.config.k8s.io/mutation to ignore. Read more about how to have Config Sync ignore object mutations.
Stop Config Connector from updating the resource's spec entirely by setting the annotation cnrm.cloud.google.com/state-into-spec to absent on the resource. This annotation is not supported for all resources. To see if your resource supports the annotation, check the corresponding resource reference page. Read more about the annotation.

Resource deleted by Config Connector

Symptom

A resource was deleted from your cluster, and you suspect Config Connector deleted it.

Cause

Config Connector never deletes your resources without an external cause. For example, running kubectl delete, using config management tools like Argo CD, or using a customized API client can cause resource deletion.

A common misconception is that Config Connector has initiated and deleted some of the resources in your cluster. For example, if you are using Config Connector, you may notice delete requests from Config Connector controller manager against certain resources from either container log messages or Kubernetes cluster audit logs. These delete requests are a result of external triggers and Config Connector is trying to reconcile the delete requests.

Resolution

To determine why a resource was deleted, you need to look into the first delete request that was sent to the corresponding resource. The best way to look into this is by examining the Kubernetes cluster audit logs.

As an example, if you are using GKE, you can use Cloud Logging to query for GKE cluster audit logs. For example, if you want to look for the initial delete requests for a BigQueryDataset resource named foo in namespace bar, you would run a query like the following:

resource.type="k8s_cluster"
resource.labels.project_id="my-project-id"
resource.labels.cluster_name="my-cluster-name"
protoPayload.methodName="com.google.cloud.cnrm.bigquery.v1beta1.bigquerydatasets.delete"
protoPayload.resourceName="bigquery.cnrm.cloud.google.com/v1beta1/namespaces/bar/bigquerydatasets/foo"

Using this query, you would look for the first delete request and then check authenticationInfo.principalEmail of that delete log message to determine the cause of the deletion.

Controller Pod OOMKilled

Symptom

You see an OOMKilled error on a Config Connector controller Pod. The Pod's status may appear as OOMKilled or Terminating.

Cause

A container or the entire Pod was terminated because they used more memory than allowed. This can be verified by running the kubectl describe command:

kubectl describe pod POD_NAME -n cnrm-system

Replace POD_NAME with the Pod you are troubleshooting.

Additionally, scrutinizing the Pod's event logs can reveal any occurrences of OOM-related events.

Resolution

To address this issue, you can use the ControllerResource custom resource to increase the memory request and the memory limit for the Pod.

Deletion

The following section lists common issues related to user-initiated delete operations that can cause conflicts with Config Connector.

Namespace deletion stuck at "Terminating"

Symptom

Deleting a namespace is stuck at the Terminating stage.

Cause

This issue can happen if you have Config Connector installed in namespaced-mode and if the namespace's ConfigConnectorContext was deleted before all Config Connector resources in that namespace are deleted. When a namespace's ConfigConnectorContext is deleted, Config Connector is disabled for that namespace, which prevents any remaining Config Connector resources in that namespace from getting deleted.

Resolution

To fix this issue, you must do a forced cleanup and then manually delete the underlying Google Cloud resources afterwards.

To mitigate this issue in the future, only delete the ConfigConnectorContext after all Config Connector resources in its namespace have been deleted from Kubernetes. Avoid deleting entire namespaces before all Config Connector resources in that namespace have been deleted since the ConfigConnectorContext might get deleted first.

Resource deletion stuck at "DeleteFailed" after project was deleted

Symptom

Deleting a Config Connector resource fails with the DeleteFailed status.

Cause

This issue can happen if a Google Cloud project is deleted before the resource.

Resolution

To fix this issue, restore the project on Google Cloud to allow Config Connector to delete the remaining child resources from Kubernetes. Alternatively, you can do a forced cleanup.

To mitigate this issue in the future, only delete Google Cloud projects after all their child Config Connector resources have been deleted from Kubernetes. Avoid deleting entire namespaces that might contain both a Project resource and its child Config Connector resources since the Project resource might get deleted first.

Permissions and authentication

The following section lists common issues related to permissions and authentication.

Compute Engine metadata not defined

Symptom

Your Config Connector resource has an UpdateFailed status with a message stating that the Compute Engine metadata is not defined, similar to the following error:

Update call failed: error fetching live state: error reading underlying
resource: summary: Error when reading or editing SpannerInstance
"my-project/my-spanner- instance": Get
"https://spanner.googleapis.com/v1/projects/my-project/instances/my-spanner-instance?alt=json":
metadata: Compute Engine metadata "instance/service-accounts/default/token?
scopes=https%!A(MISSING)%!F(MISSING)%!F(MISSING)www.googleapis.com%!F(MISSING)auth%!F(MISSING)compute%!C(MISSING)https%!A(MISSING)%!F(MISSING)%!F(MISSING)www.googleapis.com%!F(MISSIN
G)auth%!F(MISSING)cloud-platform%!C(MISSING)https%!A(MISSING)%!F(MISSING)%!F(MISSING)www.googleapis.com%!F(MISSING)auth%!F(MISSING)cloud-identity%!C(MISSING)https%!A(MISSING)%!F(MISS
ING)%!F(MISSING)www.googleapis.com%!F(MISSING)auth%!F(MISSING)ndev.clouddns.readwrite%!C(MISSING)https%!A(MISSING)%!F(MISSING)%!F(MISSING)www.googleapis.com%!F(MISSING)auth%!F(MISSIN
G)devstorage.full_control%!C(MISSING)https%!A(MISSING)%!F(MISSING)%!F(MISSING)www.googleapis.com%!F(MISSING)auth%!F(MISSING)userinfo.email%!C(MISSING)https%!A(MISSING)%!F(MISSING)%!F
(MISSING)www.googleapis.com%!F(MISSING)auth%!F(MISSING)drive.readonly" not
defined, detail:

Cause

It is likely that the IAM service account used by Config Connector does not exist.

Resolution

To fix the issue, ensure that the IAM service account used by Config Connector exists.

To mitigate this issue in the future, ensure that you follow the Config Connector installation instructions.

Error 403: Request had insufficient authentication scopes

Symptom

Your Config Connector resource has an UpdateFailed status with a message indicating a 403 error due to insufficient authentication scopes, similar to the following error:

Update call failed: error fetching live state: error reading underlying
resource: summary: Error when reading or editing SpannerInstance
"my-project/my-spanner-instance": googleapi: Error 403: Request had
insufficient authentication scopes.

Cause

Workload Identity Federation for GKE is likely not enabled on your GKE cluster.

To confirm that Workload Identity Federation for GKE is not enabled, complete the following steps:

Save the following Pod configuration as wi-test.yaml:

apiVersion: v1
kind: Pod
metadata:
  name: workload-identity-test
  namespace: cnrm-system
spec:
  containers:
  - image: google/cloud-sdk:slim
    name: workload-identity-test
    command: ["sleep","infinity"]
  serviceAccountName: cnrm-controller-manager

If you installed Config Connector using namespaced mode, serviceAccountName should be cnrm-controller-manager-NAMESPACE. Replace NAMESPACE with namespace you used during the installation.

Create the Pod in your GKE cluster:
```
kubectl apply -f wi-test.yaml
```

Open an interactive session in the Pod:

kubectl exec -it workload-identity-test \
    --namespace cnrm-system \
    -- /bin/bash

List your identity:
```
gcloud auth list
```
Verify that the identity listed matches the Google service account bound to your resources.

If you see the Compute Engine default service account instead, then that means that Workload Identity Federation for GKE is not enabled on your GKE cluster and/or node pool.

Exit the interactive session, then delete the Pod from your GKE cluster:

kubectl delete pod workload-identity-test \
    --namespace cnrm-system

Resolution

To fix this issue, ensure that Workload Identity Federation for GKE is enabled on your cluster.

If you're still seeing the same error, ensure that you have also enabled Workload Identity Federation for GKE on the cluster's node pools.

403 Forbidden: The caller does not have permission

Symptom

Your Config Connector resource has an UpdateFailed status with a message indicating a 403 error due to Workload Identity Federation for GKE, similar to the following error:

Update call failed: error fetching live state: error reading underlying
resource: summary: Error when reading or editing SpannerInstance
"my-project/my-spanner- instance": Get
"https://spanner.googleapis.com/v1/projects/my-project/instances/my-spanner-instance?alt=json":
compute: Received 403 `Unable to generate access token; IAM returned 403
Forbidden: The caller does not have permission
This error could be caused by a missing IAM policy binding on the target IAM
service account.
For more information, refer to the Workload Identity documentation:
  https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity#creating_a_relationship_between_ksas_and_gsas

Cause

Config Connector's Kubernetes service account is missing the appropriate IAM permissions to impersonate your IAM service account as a Workload Identity Federation for GKE user.

Resolution

To fix and mitigate the issue in the future, refer to the Config Connector installation instructions.

Error 403: Caller is missing IAM permission

Symptom

Your Config Connector resource has an UpdateFailed status with a message stating that the caller is missing an IAM permission, similar to the following error:

Update call failed: error fetching live state: error reading underlying
resource: summary: Error when reading or editing SpannerInstance
"my-project/my-spanner- instance": googleapi: Error 403: Caller is missing IAM
permission spanner.instances.get on resource
projects/my-project/instances/my-spanner-instance., detail:

Cause

The IAM service account used by Config Connector is missing the IAM permission stated in the message that is needed to manage the Google Cloud resource.

Resolution

If you're still seeing the same error after granting your IAM service account the appropriate IAM permissions, then check that your resource is being created in the correct project. Check the Config Connector resource's spec.projectRef field (or its cnrm.cloud.google.com/project-id annotation if the resource doesn't support a spec.projectRef field) and verify that the resource is referencing the correct project. Note that Config Connector uses the namespace's name as the project ID if neither the resource nor namespace specifies a target project. Read more about how to configure the target project for project-scoped resources.

If you're still seeing the same error, then check if Workload Identity Federation for GKE is enabled on your GKE cluster.

To mitigate this issue in the future, ensure that you follow the Config Connector installation instructions.

Update error with IAMPolicy, IAMPartialPolicy and IAMPolicyMember

Symptom

You see an UpdateFailed status with an error message indicating a 400 error because the service account does not exist:

Update call failed: error setting policy member: error applying changes: summary: Request `Create IAM Members roles/[MYROLE] serviceAccount:[NAME]@[PROJECT_ID].iam.gserviceaccount.com for project \"projects/[PROJECT_ID]\"` returned error: Error applying IAM policy for project \"projects/[PROJECT_ID]\": Error setting IAM policy for project \"projects/[PROJECT_ID]\": googleapi: Error 400: Service account [NAME]@[PROJECT_ID].iam.gserviceaccount.com does not exist., badRequest

Cause

If you delete an IAMServiceAccount Config Connector resource before cleaning up IAMPolicy,IAMPartialPolicy, and IAMPolicyMember resources that depend on that service account, Config Connector cannot locate the service account referenced in those IAM resources during reconciliation.

Resolution

To resolve this issue, check your service accounts and see if the required service account for those IAM resources is deleted. If the service account is deleted, clean up the related IAM Config Connector resources, too. For IAMPolicyMember, delete the whole resource. For IAMPolicy and IAMParitialPolicy, only remove the bindings that involve the deleted service account. However, such cleanup doesn't remove Google Cloud role bindings immediately. The Google Cloud role bindings are retained for 60 days because of the retention on the deleted service account. For more information, see the Google Cloud IAM documentation about Delete a service account.

To avoid this issue, you should always clean up IAMPolicy, IAMPartialPolicy, IAMPolicyMember Config Connector resources before deleting the referenced IAMServiceAccount.

`ServiceIdentity` resource fails with `Invalid service producer`

Symptom

Your ServiceIdentity resource has an UpdateFailed status, with an error message similar to the following:

Update call failed: error applying desired state: summary: Error creating Service Identity: Invalid service producer: ...

Cause

This error means that the specified resource does not support on-demand service identity creation.

Resolution

The ServiceIdentity resource can generate service identities only for supported services. To verify if a service supports on-demand service identity creation before applying your configuration, run the following command:

gcloud beta services identity create --service SERVICE_NAME.googleapis.com

Replace SERVICE_NAME with the name of the service, for example spanner.

If the command succeeds, Config Connector can create an identity for that service. If the command fails, it means Config Connector can't create an identity for that service.

Installation and upgrades

The following section lists common issues related to installation or upgrading the version of Config Connector.

Version not supported in Config Connector add-on installations

Symptom

If you can't enable the Config Connector add-on successfully, the following error message appears: Node version 1.15.x-gke.x s unsupported.

The error message also appears if Workload Identity Federation for GKE or GKE Monitoring are disabled.

Cause

The version of the GKE cluster doesn't meet the requirements or required features are disabled.

Resolution

To solve this error, verify that the version of the GKE cluster meets the version and feature requirements. Ensure that Workload Identity Federation for GKE and GKE Monitoring are enabled.

To get all valid versions for your clusters, run the following command:

gcloud container get-server-config --format "yaml(validMasterVersions)" \
    --zone ZONE

Replace ZONE with the Compute Engine zone.

Pick a version from the list that meets the requirements.

`failed calling webhook`

Symptom

You cannot uninstall Config Connector and receive an error similar to the following:

error during reconciliation: error building deployment objects: error finalizing the deletion of Config Connector system components deployed by ConfigConnector controller: error waiting for CRDs to be deleted: error deleting CRD accesscontextmanageraccesslevels.accesscontextmanager.cnrm.cloud.google.com: Internal error occurred: failed calling webhook "abandon-on-uninstall.cnrm.cloud.google.com": failed to call webhook: Post "https://abandon-on-uninstall.cnrm-system.svc:443/abandon-on-uninstall?timeout=3s": service "abandon-on-uninstall" not found

Cause

This issue can occur when using the Config Connector add-on and disabling Config Connector before removing the Config Connector CRDs.

Resolution

To resolve this error, you must first manually delete the webhooks:

kubectl delete validatingwebhookconfiguration abandon-on-uninstall.cnrm.cloud.google.com --ignore-not-found --wait=true
kubectl delete validatingwebhookconfiguration validating-webhook.cnrm.cloud.google.com --ignore-not-found --wait=true
kubectl delete mutatingwebhookconfiguration mutating-webhook.cnrm.cloud.google.com --ignore-not-found --wait=true

You can then proceed to uninstall Config Connector.

`PodSecurityPolicy` prevents upgrades

Symptom

After switching from the Config Connector add-on to a manual install and upgrading Config Connector to a new version, cnrm Pods fail to update.

Cause

The use of PodSecurityPolicies can prevent cnrm Pods from updating.

To confirm that the PodSecurityPolicies are preventing your upgrade, check the config-connector-operator's events and look for an error similar to the following:

create Pod configconnector-operator-0 in StatefulSet configconnector-operator failed error: pods "configconnector-operator-0" is forbidden: PodSecurityPolicy: unable to admit pod: [pod.metadata.annotations[seccomp.security.alpha.kubernetes.io/pod]: Forbidden: seccomp may not be set pod.metadata.annotations[container.seccomp.security.alpha.kubernetes.io/manager]: Forbidden: seccomp may not be set]

Resolution

To resolve this issue, you must specify the annotation on the PodSecurityPolicy that corresponds to the annotation mentioned in the error. In the previous example, the annotation is seccomp.security.alpha.kubernetes.io.

Configuration

The following section lists common issues relating to configuring resources.

Cannot make changes to immutable fields

Config Connector rejects updates to immutable fields at admission.

For example, updating an immutable field with kubectl apply causes the command to fail immediately.

This means that tools which continuously re-apply resources (for example, GitOps) might find themselves getting stuck while updating a resource if they don't handle admission errors.

Since Config Connector does not allow updates to immutable fields, the only way to perform such an update is to delete and re-create the resource.

Error updating the immutable fields when there is no update

You might see the following errors in the status of the Config Connector resource shortly after you create or acquire a Google Cloud resource using Config Connector:

Update call failed: error applying desired state: infeasible update: ({true \<nil\>}) would require recreation (example)
Update call failed: cannot make changes to immutable field(s) (example)

This might not mean that you've actually updated the resource, but the reason might be that the Google Cloud API has made a change to an immutable field that was managed by you in the Config Connector resource. This caused the mismatch between the desired state and the live state of the immutable fields.

You can resolve the issue by updating the values of those immutable fields in the Config Connector resource to match the live state. To achieve it, you should complete the following steps:

Update the YAML configuration of the Config Connector resource and set the cnrm.cloud.google.com/deletion-policy annotation to abandon.
Apply the updated YAML configuration to update the Config Connector resource's deletion policy.
Abandon the Config Connector resource.
Print out the live state of the corresponding Google Cloud resource using gcloud CLI.
Find the mismatch in between the gcloud CLI output and the YAML configuration of the Config Connector resource, and update those fields in the YAML configuration.
Apply the updated YAML configuration to acquire the abandoned resource.

No matches for kind "Foo"

Symptom

You see the error No matches for kind "Foo".

Cause

Your Kubernetes cluster does not have the CRD for the Foo resource kind installed.

Resolution

Verify that the kind is a resource kind supported by Config Connector.

If the kind is supported, then that means your Config Connector installation is either out-of-date or invalid.

If you installed Config Connector using the GKE add-on, then your installation should be upgraded automatically. If you manually installed Config Connector, then you must perform a manual upgrade.

Check the GitHub repository to determine which resource kinds are supported by which Config Connector versions (for example, here are the kinds supported by Config Connector v1.44.0).

Labels are not propagated to the Google Cloud resource

Symptom

Labels in your YAML are not showing up on the Google Cloud resource.

Cause

Not all Google Cloud resources support labels.

Resolution

Config Connector propagates labels found in metadata.labels to the underlying Google Cloud resource. Check the resource's REST API documentation (for example, here is the API documentation for PubSubTopic) to see if they support labels.

Error due to special characters in resource name

Symptom

You see an error related to invalid characters in metadata.name:

a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')

Cause

Special characters are not valid in the Kubernetes metadata.name field.

Resolution

If you'd like to give your resource a name that is not a valid Kubernetes name, but is a valid Google Cloud resource name, you can use the resourceID, field as shown in the following example:

apiVersion: sql.cnrm.cloud.google.com/v1beta1
kind: SQLUser
metadata:
  name: 'test'
spec:
  instanceRef:
    name: sqlinstance-sample-postgresql
  host: "%"
  type: CLOUD_IAM_USER
  resourceID: test.example@example-project.iam

This configuration causes Config Connector to use resourceID instead of metadata.name as the name of the resource.

Unable to remove fields from resource spec

Symptom

Removing a field from a Config Connector resource's spec does not remove it from the resource.

Cause

Removing a field from the spec for a resource managed by Config Connector does not make that field empty or revert to a default value. Instead, it causes that field to become externally-managed.

Resolution

If you want to change value of a field to empty or default in the underlying Google Cloud resource, you must zero-out the field in the Config Connector resource spec:

For list type field, set the field to an empty list by using [].

The following example shows the targetServiceAccounts field that we want to remove:

spec:
  targetServiceAccounts:
    - external: "foo-bar@foo-project.iam.gserviceaccount.com"
    - external: "bar@foo-project.iam.gserviceaccount.com"

To remove this field, set the value to empty:

spec:
  targetServiceAccounts: []

For primitive type field, set the field to empty by using one of the following:

Type Empty value

string ""

bool "false"

integer 0

The following example shows the identityNamespace field that we want to remove:
```
spec:
  workloadIdentityConfig:
    identityNamespace: "foo-project.svc.id.goog"
```
To remove this field, set the value to empty:
```
spec:
  workloadIdentityConfig:
    identityNamespace: ""
```
For object type fields, you can try to set the subfields of the object type as empty or default following the guidance in the previous section and verify if it works. However, this is not guaranteed to work.

Type	Empty value
string	""
bool	"false"
integer	0

Config Connector fails to start on Arm-based nodes

If your cluster contains node pools using the Arm architecture (such as the C4A, N4A, or Tau T2A machine series), Config Connector components might fail to run. This is a known limitation because Config Connector does not support Arm-based systems.

Symptoms

If your Config Connector instance is affected by this issue, you might experience the following symptoms:

Pods in the cnrm-system namespace remain in the Pending state.
Pods may show a CrashLoopBackOff with an error message in the logs similar to: exec user process caused "exec format error".
Describing the Pod reveals scheduling failures or architecture mismatches.

Resolution

To resolve this issue, ensure that Config Connector components are scheduled on nodes with x86 architecture:

Add an x86 node pool: If your cluster only contains Arm nodes, add at least one node pool using an x86 machine type (such as e2-standard-2) to host the Config Connector controller Pods.
Verify node taints: GKE Arm nodes are typically tainted with kubernetes.io/arch=arm64:NoSchedule to prevent x86-only workloads from being scheduled on them. Ensure that you haven't added tolerations to the Config Connector deployments that would allow them to run on these nodes.

Forced cleanup

If your Config Connector resources are stuck on deletion and you simply want to get rid of them from your Kubernetes cluster, you can force their deletion by deleting their finalizers.

You can delete a resource's finalizers by editing the resource using kubectl edit, deleting the metadata.finalizers field, and then saving the file to preserve your changes to the Kubernetes API Server.

Since deleting a resource's finalizers allows the resource to be immediately deleted from the Kubernetes cluster, Config Connector might (but not necessarily) not get a chance to complete the deletion of the underlying Google Cloud resource. This means that you might want to manually delete your Google Cloud resources afterwards.

Monitoring

Monitoring Config Connector and exploring its logs can help you determine the soruce of issues and better understand unexpected behavior.

Metrics

You can use Prometheus to collect and show metrics from Config Connector.

Logging

All Config Connector Pods output structured logs in JSON format.

The logs of the controller Pods are particularly useful for debugging issues with the reconciliation of resources.

You can query for logs for specific resources by filtering for the following fields in the log messages:

logger: contains the resource's kind in lower-case. For example, PubSubTopic resources have a logger of pubsubtopic-controller.
resource.namespace: contains the resource's namespace.
resource.name: contains the resource's name.

Using Logging for advanced log querying

If you are using GKE, you can use Cloud Logging to query for logs for a specific resource with the following query:

# Filter to include only logs coming from the controller Pods
resource.type="k8s_container"
resource.labels.container_name="manager"
resource.labels.namespace_name="cnrm-system"
labels.k8s-pod/cnrm_cloud_google_com/component="cnrm-controller-manager"

# Filter to include only logs coming from a particular GKE cluster
resource.labels.cluster_name="GKE_CLUSTER_NAME"
resource.labels.location="GKE_CLUSTER_LOCATION"

# Filter to include only logs for a particular Config Connector resource
jsonPayload.logger="RESOURCE_KIND-controller"
jsonPayload.resource.namespace="RESOURCE_NAMESPACE"
jsonPayload.resource.name="RESOURCE_NAME"

Replace the following:

GKE_CLUSTER_NAME with the name of the GKE cluster running Config Connector
GKE_CLUSTER_LOCATION with the location of the GKE cluster running Config Connector. For example, us-central1.
RESOURCE_KIND with the resource's kind in lower-case. For example, pubsubtopic.
RESOURCE_NAMESPACE with the resource's namespace.
RESOURCE_NAME with the resource's name.

Additional help

To get additional help, you can file an issue on GitHub or contact Google Cloud Support.

Troubleshoot Config Connector

Check Config Connector status and conditions

Check the version of Config Connector

Check the resource's status and events

Check that Config Connector is running

Check the controller logs

Abandon and acquire the resource

Troubleshoot by issue type

Reconciliation

Resource keeps updating every 5-15 mins

Symptom

Cause

Resolution

Resource has no status

Symptom

Cause

Resolution

KNV2005: syncer excessively updating resource

Symptom

Cause

Resolution

Resource deleted by Config Connector

Symptom

Cause

Resolution

Controller Pod OOMKilled

Symptom

Cause

Resolution

Deletion

Namespace deletion stuck at "Terminating"

Symptom

Cause

Resolution

Resource deletion stuck at "DeleteFailed" after project was deleted

Symptom

Cause

Resolution

Permissions and authentication

Compute Engine metadata not defined

Symptom

Cause

Resolution

Error 403: Request had insufficient authentication scopes

Symptom

Cause

Resolution

403 Forbidden: The caller does not have permission

Symptom

Cause

Resolution

Error 403: Caller is missing IAM permission

Symptom

Cause

Resolution

Update error with IAMPolicy, IAMPartialPolicy and IAMPolicyMember

Symptom

Cause

Resolution

ServiceIdentity resource fails with Invalid service producer

Symptom

Cause

Resolution

Installation and upgrades

Version not supported in Config Connector add-on installations

Symptom

Cause

Resolution

failed calling webhook

Symptom

Cause

Resolution

PodSecurityPolicy prevents upgrades

Symptom

Cause

Resolution

Configuration

Cannot make changes to immutable fields

Error updating the immutable fields when there is no update

No matches for kind "Foo"

`ServiceIdentity` resource fails with `Invalid service producer`

`failed calling webhook`

`PodSecurityPolicy` prevents upgrades