Issues with the logging pipeline in Google Kubernetes Engine (GKE) can prevent your cluster logs from appearing in Cloud Logging, hindering your monitoring and debugging efforts.
Use this document to learn how to verify configurations and permissions, resolve resource and performance issues, investigate filters and application behavior, and address platform or service problems affecting your logs.
This information is important for Platform admins and operators who maintain cluster observability and for anyone who uses Cloud Logging to troubleshoot GKE operations. For more information about the common roles and example tasks that we reference in Google Cloud content, see Common GKE user roles and tasks.
For more information about how to use logs to troubleshoot your workloads and clusters, see Conduct historical analysis with Cloud Logging.
Find your solution by symptom
If you've identified a specific symptom related to your missing logs, use the following table to find troubleshooting advice:
| Category | Symptom or observation | Potential cause | Troubleshooting steps |
|---|---|---|---|
| Configuration | No logs from any cluster in the project are visible. | The Cloud Logging API is disabled for the project. | Verify the Cloud Logging API status |
| Logs are missing from a specific cluster, or only certain log types are missing. | Cluster-level logging is disabled for the required log types. | Verify cluster logging configuration | |
| Logs are missing from nodes in a specific node pool. | The node pool's nodes lack the required access scope. | Verify node pool access scopes | |
Permission errors (401 or 403) appear in logging agent logs. |
The node's service account is missing the required permission. | Verify node service account permissions | |
| Resource and performance | Logs are missing intermittently, or you see RESOURCE_EXHAUSTED errors. |
The project is exceeding the Cloud Logging API write quota. | Investigate Cloud Logging API quota usage |
| Some logs from a specific node are missing, often during high traffic or load. | The node is exceeding logging agent throughput limits, or lacks resources (CPU or memory) to process logs. | Investigate node throughput and resource usage | |
| Filtering and application behavior | Specific logs that match a certain pattern are consistently missing. | A log exclusion filter in Cloud Logging is unintentionally dropping the logs. | Investigate log exclusion filters |
| Logs from a container are significantly delayed or appear only after the container exits. | The application's output is fully buffered, often due to piping stdout. |
Investigate container log buffering and delays | |
| Expected logs don't appear in search results. | Query filters in Logs Explorer might be too restrictive. | Investigate Logs Explorer queries | |
| No logs are visible from a specific application Pod, but other cluster logs are present. | The application inside the container isn't writing to stdout or stderr. |
Investigate application-specific logging behavior | |
| Platform and service | Logs older than a certain date don't appear. | The logs have passed their retention period and have been deleted by Cloud Logging. | Investigate log retention periods |
| Widespread log loss or delays across projects or clusters. | Cloud Logging service issue or ingestion delay. | Investigate Cloud Logging service issues and delays | |
| Logging issues coincide with cluster version limitations. | Unsupported GKE version. | Investigate cluster version |
Use automated diagnostic tools
The following sections cover tools that can automatically inspect your cluster for common misconfigurations and help investigate complex problems.
Debug GKE logging issues with gcpdiag
If you are missing or getting incomplete logs from your GKE
cluster, use the gcpdiag tool for troubleshooting.
gcpdiag
is an open source tool. It is not an officially supported Google Cloud product.
You can use the gcpdiag tool to help you identify and fix Google Cloud
project issues. For more information, see the
gcpdiag project on GitHub.
- Project-level logging: ensures that the Google Cloud project housing the GKE cluster has the Cloud Logging API enabled.
- Cluster-level logging: verifies that logging is explicitly enabled within the configuration of the GKE cluster.
- Node pool permissions: confirms that the nodes within
the cluster's node pools have the
Cloud Logging Writescope enabled, allowing them to send log data. - Service account permissions: validates that the service
account used by the node pools possesses the necessary IAM
permissions to interact with Cloud Logging. Specifically, the
roles/logging.logWriterrole is typically required. - Cloud logging API write quotas: verifies that Cloud Logging API Write quotas have not been exceeded within the specified timeframe.
Google Cloud console
- Complete and then copy the following command.
- Open the Google Cloud console and activate Cloud Shell. Open Cloud console
- Paste the copied command.
- Run the
gcpdiagcommand, which downloads thegcpdiagdocker image, and then performs diagnostic checks. If applicable, follow the output instructions to fix failed checks.
gcpdiag runbook gke/logs \
--parameter project_id=PROJECT_ID \
--parameter name=CLUSTER_NAME \
--parameter location=LOCATIONDocker
You can
run gcpdiag using a wrapper that starts gcpdiag in a
Docker container. Docker or
Podman must be installed.
- Copy and run the following command on your local workstation.
curl https://gcpdiag.dev/gcpdiag.sh >gcpdiag && chmod +x gcpdiag
- Execute the
gcpdiagcommand../gcpdiag runbook gke/logs \ --parameter project_id=PROJECT_ID \ --parameter name=CLUSTER_NAME \ --parameter location=LOCATION
View available parameters for this runbook.
Replace the following:
PROJECT_ID: the ID of the project containing the resource.CLUSTER_NAME: the name of the GKE cluster.LOCATION: the Compute Engine region or zone (for example,us-central1orus-central1-a) for the cluster.
Useful flags:
--universe-domain: If applicable, the Trusted Partner Sovereign Cloud domain hosting the resource--parameteror-p: Runbook parameters
For a list and description of all gcpdiag tool flags, see the
gcpdiag usage instructions.
Use Gemini Cloud Assist investigations
Consider using Gemini Cloud Assist investigations to gain additional insights into your logs and resolve issues. For more information about different ways to initiate an investigation by using the Logs Explorer, see Troubleshoot issues with Gemini Cloud Assist Investigations in the Gemini documentation.
Verify logging configuration and permissions
Incorrect settings are a common reason for missing GKE logs. Use the following sections to verify your Cloud Logging configuration.
Verify the Cloud Logging API status
For logs to be collected from any cluster in your project, the Cloud Logging API must be active.
Symptoms:
No logs from any GKE resources in your project are visible in Cloud Logging.
Cause:
The Cloud Logging API is disabled for the Google Cloud project, preventing the logging agent on the nodes from sending logs.
Resolution:
To verify that the Cloud Logging API is enabled and enable it if necessary, select one of the following options:
Console
In the Google Cloud console, go to the Enabled APIs & services page.
In the Filter field, type
Cloud Logging APIand press Enter.If the API is enabled, you see it listed. If the API isn't listed, enable it:
- Click Enable APIs and services.
- In the Search field, type
Cloud Logging APIand press Enter. - Click the Cloud Logging API result.
- Click Enable.
gcloud
Check if the API is enabled:
gcloud services list --enabled --filter="NAME=logging.googleapis.com"The output should be the following:
NAME: logging.googleapis.com TITLE: Cloud Logging APIIf the API isn't listed in the enabled services, enable it:
gcloud services enable logging.googleapis.com \ --project=PROJECT_IDReplace
PROJECT_IDwith your Google Cloud project ID.
Verify cluster logging configuration
GKE lets you configure which log types (such as SYSTEM or
WORKLOAD) are collected from a cluster.
Symptoms:
No logs are appearing in Cloud Logging from a specific GKE
cluster, or only certain types of logs (like SYSTEM) are missing.
Cause:
Cluster-level logging is disabled for the required log types. If you're using an Autopilot cluster, this isn't the cause of your logging issues. This setting is configurable for Standard clusters, but is always enabled by default on Autopilot clusters and can't be disabled.
Resolution:
To check and update the cluster's logging configuration, select one of the following options:
Console
In the Google Cloud console, go to the Kubernetes clusters page.
Click the name of the cluster that you want to investigate.
Click the Details tab and navigate to the Features section.
In the Logging row, review which log types, such as System or Workloads, are enabled.
If the log types that you want to collect are disabled or incorrect, click Edit Logging.
In the Components list, select the checkboxes for the log types that you want to collect and click OK. For more information about available log types, see About GKE logs.
Click Save Changes.
gcloud
To check the logging configuration, describe the cluster:
gcloud container clusters describe CLUSTER_NAME \ --location=LOCATION \ --project=PROJECT_ID \ --format="value(name,loggingConfig.componentConfig.enableComponents)"Replace the following:
CLUSTER_NAME: the name of your cluster.LOCATION: the Compute Engine region or zone (for example,us-central1orus-central1-a) for the cluster.PROJECT_ID: your Google Cloud project ID.
If logging is enabled, the output is similar to the following:
example-cluster SYSTEM_COMPONENTS;WORKLOADSIf the output is
NONE, then logging is disabled.If the log types that you want are disabled or incorrect, update the logging configuration:
gcloud container clusters update CLUSTER_NAME \ --location=LOCATION \ --project=PROJECT_ID \ --logging=LOGGING_TYPEReplace
LOGGING_TYPEwithSYSTEM,WORKLOAD, or both. To collect any logs,SYSTEMmust be enabled.WORKLOADlogs can't be collected on their own. For more information about available log types, see About GKE logs.
Verify node pool access scopes
Nodes in a GKE cluster use OAuth access scopes to get permission to interact with Google Cloud APIs, including Cloud Logging.
Symptoms:
Logs are missing from nodes in a specific node pool.
Cause:
The nodes in the node pool don't have the necessary OAuth access scope. One of the following scopes is required for nodes to write logs to Cloud Logging:
https://www.googleapis.com/auth/logging.write: grants permission to write logs. This is the minimum scope required.https://www.googleapis.com/auth/logging.admin: grants full access to the Cloud Logging API, which includes the permissions fromlogging.write.https://www.googleapis.com/auth/cloud-platform: grants full access to all enabled Google Cloud APIs, which includes the permissions fromlogging.write.
Resolution:
To verify the permissions and grant the required roles if missing, select one of the following options:
Console
Verify the node pool's access scopes:
In the Google Cloud console, go to the Kubernetes clusters page.
To open the cluster's details page, click the name of the cluster that you want to investigate.
Click the Nodes tab.
In the Node Pools section, click the name of the node pool that you want to investigate.
Navigate to the Security section.
Review the scopes listed in the Access scopes field. Ensure that at least one of the required scopes is present:
- Stackdriver Logging API - Write Only
- Stackdriver Logging API - Full
- Cloud Platform - Enabled
If the required scopes are missing, re-create the node pool. You can't change access scopes on an existing node pool.
If needed, create a new node pool with the required scope:
- Navigate back to the cluster details page for the cluster that you want to modify.
- Click the Nodes tab.
- Click Create user-managed node pool.
- Fill in the Node pool details section.
- In the left-hand navigation, click Security.
- In the Access scopes section, select the roles that you want to
add:
- To add specific scopes, select Set access for each API.
- To allow full access, select Allow full access to all Cloud APIs.
- Configure any other sections as needed.
- Click Create.
Migrate your workloads to the new node pool. After you migrate workloads to the new node pool, your applications run on nodes that have the necessary scopes to send logs to Cloud Logging.
Delete the old node pool:
- Navigate back to the cluster details page and select the Nodes tab.
- In the Node Pools section, find the old node pool.
- Next to the node pool, click Delete .
- When prompted, confirm deletion by typing the node pool name and click Delete.
gcloud
Verify the node pool's access scopes:
gcloud container clusters describe CLUSTER_NAME \ --location=LOCATION \ --project=PROJECT_ID \ --format="value(nodePools[].name,nodePools[].config.oauthScopes)"Replace the following:
CLUSTER_NAME: the name of your cluster.LOCATION: the Compute Engine region or zone (for example,us-central1orus-central1-a) for the cluster.PROJECT_ID: your Google Cloud project ID.
Review the output for each node pool. Ensure that at least one of the required scopes (
https://www.googleapis.com/auth/logging.write,https://www.googleapis.com/auth/cloud-platform, orhttps://www.googleapis.com/auth/logging.admin) is listed. If the required scopes are missing, re-create the node pool. You can't change access scopes on an existing node pool.If needed, create a new node pool with the required scope:
gcloud container node-pools create NEW_NODE_POOL_NAME \ --cluster=CLUSTER_NAME \ --location=LOCATION \ --project=PROJECT_ID \ --scopes=https://www.googleapis.com/auth/logging.write,OTHER_SCOPESReplace the following:
NEW_NODE_POOL_NAME: a name for the new node pool.OTHER_SCOPES: a comma-separated list of any other scopes that your nodes require. If you don't need other scopes, omit this placeholder and the preceding comma.
Migrate your workloads to the new node pool. After you migrate workloads to the new node pool, your applications run on nodes that have the necessary scopes to send logs to Cloud Logging.
Delete the old node pool:
gcloud container node-pools delete OLD_NODE_POOL_NAME \ --cluster=CLUSTER_NAME \ --location=LOCATION \ --project=PROJECT_ID
Verify node service account permissions
Nodes use a service account to authenticate with Google Cloud services, and this account needs specific IAM permissions to write logs.
Symptoms:
- Logs are missing from nodes.
- Inspecting logging agent logs (for example, Fluent Bit) might show
permission-related errors, such as
401or403codes when trying to write to Cloud Logging. - You might see a
Grant Critical Permissions to Node Service Accountnotification for the cluster in the Google Cloud console.
Cause:
The service account used by the node pool's nodes lacks the necessary
IAM permissions to write logs to Cloud Logging. Nodes require
a service account with the logging.logWriter role, which includes the
logging.logEntries.create permission.
Additionally, for GKE
versions 1.33 or later, the GKE Default Node Service
Agent
(service-PROJECT_NUMBER@gcp-sa-gkenode.iam.gserviceaccount.com) must have the Kubernetes Default Node Service
Agent (roles/container.defaultNodeServiceAgent) role at a minimum. This role
lets GKE manage node resources and operations, including logging
components.
Resolution:
To verify the permissions and grant the required roles if missing, do the following:
- Verify the node's service account permission.
- Verify that the GKE service agent has the required role.
Verify the node service account permission
The node service account is the account your node uses to authenticate and
send logs. To identify this service account and verify it has the required
roles/logging.logWriter permission, do the following:
To identify the service account used by the node pool, select one of the following options:
Console
In the Google Cloud console, go to the Kubernetes clusters page.
In the cluster list, click the name of the cluster that you want to inspect.
Depending on the cluster mode of operation, do one of the following:
For Standard clusters, do the following:
- Click the Nodes tab.
- In the Node pools table, click a node pool name. The Node pool details page opens.
- In the Security section, find the Service account field.
For Autopilot clusters, do the following:
- Go to the Details tab.
- In the Security section, find the Service account field.
If the value in the Service account field is
default, your nodes use the Compute Engine default service account. If the value in this field isn'tdefault, your nodes use a custom service account. To grant the required role to a custom service account, see Use least privilege IAM service accounts.
gcloud
Run the following command, depending on the type of cluster that you use:
Standard clusters
gcloud container node-pools describe NODE_POOL_NAME \ --cluster=CLUSTER_NAME \ --location=LOCATION \ --project=PROJECT_ID \ --format="value(config.serviceAccount)"Replace the following:
NODE_POOL_NAME: the name of the node pool.CLUSTER_NAME: the name of your Standard cluster.LOCATION: the Compute Engine region or zone (for example,us-central1orus-central1-a) for the cluster.PROJECT_ID: your Google Cloud project ID.
If the output is
default, then the node pool uses the Compute Engine default service account. If the value isn'tdefault, your nodes use a custom service account. To grant the required role to a custom service account, see Use least privilege IAM service accounts.Autopilot clusters
gcloud container clusters describe CLUSTER_NAME \ --location=LOCATION \ --project=PROJECT_ID \ --format="value(nodePoolDefaults.nodeConfigDefaults.serviceAccount)"Replace the following:
CLUSTER_NAME: the name of your Autopilot cluster.LOCATION: the Compute Engine region or zone (for example,us-central1orus-central1-a) for the cluster.PROJECT_ID: your Google Cloud project ID.
If the output is
default, then the node pool uses the Compute Engine default service account. If the value isn'tdefault, your nodes use a custom service account. To grant the required role to a custom service account, see Use least privilege IAM service accounts.For more detailed scripts to identify missing permissions, see Identify all node service accounts with missing permissions.
GKE automatically scans for missing permissions and provides recommendations. To use recommendations to check for missing permissions, select one of the following options:
Console
- In the Kubernetes clusters page, locate the Notifications column.
- Check the Notifications column for the Grant critical
permissions recommendation. This recommendation means that the
NODE_SA_MISSING_PERMISSIONScheck has failed. - If the recommendation is present, click it. A details panel opens, explaining the missing permissions and providing the steps to fix it.
gcloud
List recommendations for missing service account permissions:
gcloud recommender recommendations list \ --recommender=google.container.DiagnosisRecommender \ --location LOCATION \ --project PROJECT_ID \ --format yaml \ --filter="recommenderSubtype:NODE_SA_MISSING_PERMISSIONS"Analyze the command output:
If the command returns an empty list, then the recommender hasn't found any active
NODE_SA_MISSING_PERMISSIONSrecommendations. The service accounts it checked appear to have the required permissions.If the command returns one or more YAML blocks, then the recommender has identified a permission issue. Review the output for the following key fields:
description: provides a summary of the issue, such as specifying that your node service account is missing theroles/logging.logWriterrole or that the GKE service agent is missing theroles/container.defaultNodeServiceAgentrole.resource: specifies the cluster that's affected.content.operations: contains the recommended resolution. This section provides the exactgcloud projects add-iam-policy-bindingcommand required to grant the specific missing role.
The recommender can take up to 24 hours to reflect recent changes.
If you want to verify the permissions immediately, to check permissions and grant the role, select one of the following options:
Console
In the Google Cloud console, go to the IAM page.
Find the service account used by the node pool.
In the Role column, check if the service account has the Logs Writer (
roles/logging.logWriter) role.If the permission is missing, add it:
- Click Edit principal
- Click Add another role.
- In the search field, enter
Logs Writer. - Select the Logs Writer checkbox and click Apply.
- Click Save.
gcloud
Check current roles for the node service account:
gcloud projects get-iam-policy PROJECT_ID \ --flatten="bindings[].members" \ --format='table(bindings.role)' \ --filter="bindings.members:serviceAccount:SERVICE_ACCOUNT_EMAIL"If it's missing, grant the
logging.logWriterrole:gcloud projects add-iam-policy-binding PROJECT_ID \ --member="serviceAccount:SERVICE_ACCOUNT_EMAIL" \ --role="roles/logging.logWriter"
Verify the GKE service agent permissions
If logs are still missing, and you use a version 1.33 or later, verify that the Google-managed agent that GKE uses to manage your node components has the required permission:
To identify the service agent's email address, get your project number:
gcloud projects describe PROJECT_ID --format="value(projectNumber)"Replace
PROJECT_IDwith your project ID. Note the output.The GKE Service Agent's email is:
service-PROJECT_NUMBER@gcp-sa-gkenode.iam.gserviceaccount.comTo use recommendations to check for missing permissions, select one of the following options:
Console
- In the Kubernetes clusters page, find the Notifications column.
- Check the column for the Grant critical permissions recommendation.
- If the recommendation is present, click it. A details panel opens, explaining the missing permissions and providing the steps to fix it.
gcloud
List recommendations for missing service account permissions:
gcloud recommender recommendations list \ --recommender=google.container.DiagnosisRecommender \ --location LOCATION \ --project PROJECT_ID \ --format yaml \ --filter="recommenderSubtype:NODE_SA_MISSING_PERMISSIONS"Analyze the command output. Review the output for a description specifying that the GKE service agent (
gcp-sa-gkenode) is missing theroles/container.defaultNodeServiceAgentrole.
To immediately check permissions and grant the role, select one of the following options:
Console
In the Google Cloud console, go to the IAM page.
In the Filter field, type the GKE Service Agent's email address and press Enter.
In the filtered list, check if the service agent has at least the Kubernetes Default Node Service Agent (
roles/container.defaultNodeServiceAgent) role.If the role is missing, grant it:
- Click Edit principal next to the service agent.
- Click Add roles.
- In the search field, enter
Kubernetes Default Node Service Agentand select the role. - Click Save.
gcloud
Verify if the
roles/container.defaultNodeServiceAgentrole is bound to the service agent:gcloud projects get-iam-policy PROJECT_ID \ --flatten="bindings[].members" \ --format='table(bindings.role)' \ --filter="bindings.members:serviceAccount:service-PROJECT_NUMBER@gcp-sa-gkenode.iam.gserviceaccount.com"In the output, look for
roles/container.defaultNodeServiceAgent.If the role is missing, grant the Kubernetes Default Node Service Agent role:
gcloud projects add-iam-policy-binding PROJECT_ID \ --member="serviceAccount:service-PROJECT_NUMBER@gcp-sa-gkenode.iam.gserviceaccount.com" \ --role="roles/container.defaultNodeServiceAgent"
Troubleshoot resource and performance issues
If logs are missing intermittently or are dropped from high-traffic nodes, the cause might not be a misconfiguration, but a performance issue. Use the following sections to investigate whether your project is exceeding API quotas or if high log volume is overwhelming the agents on specific nodes.
Investigate Cloud Logging API quota usage
To protect the service, Cloud Logging enforces a write quota on all projects, limiting the total volume of logs that Cloud Logging can ingest per minute.
Symptoms:
- Logs are intermittently or completely missing.
- You see
RESOURCE_EXHAUSTEDerrors related tologging.googleapis.comin node or logging agent logs.
Cause:
The project is exceeding the Cloud Logging API write requests quota. This issue prevents the logging agent from sending logs.
Resolution:
To check the quota usage and request an increase, do the following:
In the Google Cloud console, go to the Quotas page.
In the Filter field, type
Cloud Logging APIand press Enter.In the filtered list, find the quota for Log write bytes per minute per region for the region that your cluster is in.
Review the values in the Current usage percentage column. If usage is at or near the limit, you've likely exceeded the quota.
To request an increase, click Edit quota, and follow the prompts. For more information, see View and manage quotas.
To reduce usage, consider excluding logs or reducing log verbosity from applications. You can also set up alerting policies to be notified before reaching the limit.
Investigate node throughput and resource usage
The GKE logging agent on each node has its own throughput limit, which can be exceeded.
Symptoms:
Logs from specific nodes are intermittently missing or delayed, particularly during periods of high cluster activity or heavy node resource usage.
Cause:
The GKE logging agent has a default throughput limit
(approximately 100 KBps per node). If applications on a node generate logs
faster than this limit, the agent might drop logs, even if the project's overall
API quota isn't exceeded. You can monitor node logging throughput by using the
kubernetes.io/node/logs/input_bytes metric in
Metrics Explorer.
Logs might also be missing if the node is under heavy CPU or memory pressure, leaving insufficient resources for the agent to process logs.
Resolution:
To reduce throughput, select one of the following options:
Standard clusters
Try the following solutions:
Enable high throughput logging: this feature increases the per-node capacity. For more information, see Adjust Cloud Logging agent throughput.
Reduce log volume: analyze application logging patterns. Reduce unnecessary or excessively verbose logging.
Deploy a custom logging agent: you can deploy and manage your own customized Fluent Bit DaemonSet, but you are then responsible for its configuration and maintenance.
Check node resource usage: even if the log volume is within limits, ensure the nodes aren't under heavy CPU or memory pressure. Insufficient node resources can hinder the logging agent's ability to process and send logs. You can check metrics like
kubernetes.io/node/cpu/core_usage_timeandkubernetes.io/node/memory/used_bytesin Metrics Explorer.
Autopilot clusters
Try the following solutions:
Reduce log volume: analyze your application logging patterns. Reduce unnecessary or excessively verbose logging. Ensure logs are structured where possible, because these types of logs can help with efficient processing. Exclude logs that aren't essential.
Optimize application performance: because node resources are managed in Autopilot clusters, ensure your applications aren't excessively consuming CPU or memory, which could indirectly affect the performance of node components like the logging agent. Although you don't manage nodes directly, application efficiency affects overall node health.
Troubleshoot filtering and application issues
When your application successfully generates logs, but they still don't appear
in Cloud Logging, the issue is often caused by filtering or the application's
logging behavior. The following sections explore issues like log exclusion
filters, container-level buffering, restrictive search queries, and applications
not writing to stdout or stderr.
Investigate log exclusion filters
The Logs Explorer only shows logs that match all filters in your query and the selected time range.
Symptoms:
Specific logs that match certain criteria are missing from Cloud Logging, but other logs from the same sources are present.
Cause:
Log exclusion filters are defined in your Cloud Logging sinks (often the
_Default sink). These rules silently drop logs that match
specific criteria, even if they were successfully sent by the node.
Resolution:
To review and modify exclusion filters, select one of the following options:
Console
In the Google Cloud console, go to the Logs Router page.
Identify the problematic filter:
- For each sink (besides the
_Requiredsink, which can't have exclusion filters) click More actions and select View sink details. - Review the queries in the Exclusion filters section. Compare the filter logic against the attributes of your missing logs (for example, resource type, labels, or keywords).
- Copy the exclusion filter query.
Go to the Logs Explorer page.
Paste the exclusion filter query into the query pane and click Run query.
Review the results. The logs displayed are what the filter would exclude. If your missing logs appear in these results, then this filter is likely the cause.
- For each sink (besides the
Disable or edit the filter:
- Return to the Logs Router page.
- Click More actions for the sink with the suspect filter and select Edit sink.
- Locate the Choose logs to filter out of sink section and find the exclusion filter.
- You can either click Disable to disable the filter, or modify its query to be more specific.
- Click Update sink. Changes apply to new logs.
gcloud
List all sinks in the project:
gcloud logging sinks list --project=PROJECT_IDView each sink's exclusion filters:
gcloud logging sinks describe SINK_NAME --project=PROJECT_IDIn the output, review the
exclusionssection. Compare the filter logic against the attributes of your missing logs (for example, resource type, labels, or keywords).To modify exclusions, update the sink's configuration:
Export the sink's configuration to a local file (for example,
sink-config.yaml):gcloud logging sinks describe SINK_NAME \ --format=yaml > sink-config.yamlOpen the
sink-config.yamlfile in a text editor.Find the
exclusions:section and remove or modify the problematic filter.Update the modified sink:
gcloud logging sinks update SINK_NAME sink-config.yaml \ --project=PROJECT_IDFor more information about this command, see the
gcloud logging sinks updatedocumentation.
Investigate container log buffering and delays
Applications and operating systems often use buffering to write data in chunks instead of line-by-line, which can improve performance.
Symptoms:
- Logs from specific containers appear in Cloud Logging only after the container exits, or there's a significant delay in the logs appearing.
- Sometimes, logs are incomplete.
Cause:
This issue is often caused by log buffering. Although standard output (stdout)
is typically line-buffered when connected to a terminal, this behavior changes
when output is piped. If an application's logs or startup scripts within a
container pipe stdout to other commands (for example, my-app | grep ...),
the output might become fully buffered. As a result, logs are held until the
buffer is full or the pipe closes. This behavior can cause delays or data loss
if the container terminates unexpectedly. Application-internal buffering can
also cause delays.
Resolution:
To resolve the issue, try the following solutions:
- Avoid piping
stdout: if possible, modify container entry points or application commands to write logs directly tostdoutorstderrwithout piping through other commands likegreporsedwithin the container. - Ensure line buffering:
- If piping is unavoidable, use tools that support line buffering. For
example, use
grep --line-buffered. - For custom applications, ensure they flush logs frequently, ideally
after each line, when they write to
stdout. Many logging libraries have settings to control buffering.
- If piping is unavoidable, use tools that support line buffering. For
example, use
Test buffering behavior: deploy the following Pod manifest and observe the effects in the logs by using the
kubectl logs -f buffered-podcommand. Experiment by commenting and uncommenting the differentcommandarrays in thebuffered-containermanifest:# buffered.yaml apiVersion: v1 kind: ConfigMap metadata: name: run-script data: run.sh: | #!/bin/bash echo "Starting..." for i in $(seq 3600); do echo "Log ${i}" sleep 1 done echo "Exiting." --- apiVersion: v1 kind: Pod metadata: name: buffered-pod spec: containers: - name: buffered-container image: ubuntu # Or any other image with bash # Case 1: Direct execution - line buffered by default to TTY # Logs appear immediately. command: ['/bin/bash', '-c', '/mnt/run.sh'] # Case 2: Piped to grep - fully buffered by default # Logs might be delayed or appear in chunks. # command: ['/bin/bash', '-c', '/mnt/run.sh | grep Log'] # Case 3: Piped to grep with --line-buffered # Logs appear immediately. # command: ['/bin/bash', '-c', '/mnt/run.sh | grep --line-buffered Log'] volumeMounts: - name: scripts mountPath: /mnt volumes: - name: scripts configMap: name: run-script defaultMode: 0777 restartPolicy: Never
Investigate Logs Explorer queries
If you are confident your logs are being collected but you can't find them, your search query or time range might be the issue.
Symptoms:
Expected logs aren't appearing in search results, even though you know the application is generating them.
Cause:
Your query in the Logs Explorer might have filters (for example, on namespaces, labels, resource types, or text) that inadvertently exclude the logs that you're looking for.
Resolution:
In the Google Cloud console, go to the Logs Explorer page.
Click Pick time range. Even if you think you know when the logs occurred, try a significantly broader range to rule out timing issues.
Simplify the query:
- Clear all filters.
Try filtering by your cluster only:
resource.type="k8s_container" resource.labels.cluster_name="CLUSTER_NAME" resource.labels.location="LOCATION"Replace the following:
CLUSTER_NAME: the name of your cluster.LOCATION: the Compute Engine region or zone (for example,us-central1orus-central1-a) for the cluster.
Click Run query.
If the broad query works, re-introduce your original filters one by one:
- Resource type: make sure that you use the correct resource type. For
example, are you filtering by
k8s_containerwhen you should be filtering byk8s_node? - Labels: double-check spellings for
resource.labelssuch asnamespace_name,container_name, or custom labels. - Severity: make sure the severity level (for example,
severity=ERROR) isn't too restrictive. - Text payload: check for spelling mistakes and overly restrictive
strings in search terms. For example, use
:for "contains" instead of=for an exact match (jsonPayload.message:"error"instead ofjsonPayload.message="error").
- Resource type: make sure that you use the correct resource type. For
example, are you filtering by
Verify that your filters account for case sensitivity (text is usually case-insensitive, but labels might not be), ensure values have no hidden characters or extra spaces, and check if terms with special characters need to be enclosed in quotes.
Review the Timeline. Sudden drops when adding a filter can help you to identify the problematic part of the query.
For more advice about effective logging queries, see Finding log entries quickly in the Cloud Logging documentation.
If you still can't find the logs after you refine your query, the issue might not be the query, but a problem described in other sections of this document.
Investigate application-specific logging behavior
The GKE logging agent only collects logs written to the stdout
and stderr streams.
Symptoms:
No logs for a specific Pod or container are visible in Cloud Logging, even though other logs from the cluster are present.
Cause:
The application isn't writing to stdout or stderr. It might be
misconfigured to write logs to a file inside the container, where the
logging agent can't collect them.
The application might also be mixing JSON and non-JSON text in its output. The logging agent's parser expects a consistent format (JSON or text) from a single stream. If an application configured for JSON logging outputs a plain-text line, it can break the parser, causing logs to be dropped or ingested incorrectly.
Resolution:
Determine the Pod name and namespace of the application whose logs are missing:
kubectl get pods -n NAMESPACE_NAMECheck container logs:
If the Pod has a single container, run the following command:
kubectl logs POD_NAME \ -n NAMESPACE_NAMEReplace the following:
POD_NAME: the name of your Pod.NAMESPACE_NAME: the namespace of your Pod.
If the Pod has multiple containers, specify the container name:
kubectl logs POD_NAME \ -c CONTAINER_NAME \ -n NAMESPACE_NAMEReplace
CONTAINER_NAMEwith the name of the container within the Pod.To follow logs in real-time, run the following command:
kubectl logs -f POD_NAME \ -c CONTAINER_NAME \ -n NAMESPACE_NAMEReplace the following:
POD_NAME: the name of your Pod.CONTAINER_NAME: the name of the container within the Pod.NAMESPACE_NAME: the namespace of your Pod.
Analyze the output:
If the
kubectl logscommand has no output or if the command output doesn't contain the expected logs, then the problem is with the application itself. Thekubectl logscommand reads directly from thestdoutandstderrstreams captured by the container runtime. If logs aren't here, GKE's logging agent can't see them.Change your application's code or configuration to stop writing to a file and instead log all messages directly to
stdout(for regular logs) andstderr(for error logs).If you see a mix of JSON strings and plain text lines, this output indicates a mixed-format issue. Configure your application to only write valid, single-line JSON objects to
stdoutandstderr.If the
kubectl logscommand does show the expected logs, then the issue is likely further down the logging pipeline (for example, agent, permissions, or Cloud Logging service).
Troubleshoot platform and service issues
The following sections help you investigate issues external to your immediate configuration, such as log retention policies, Cloud Logging health, or unsupported GKE versions.
Investigate log retention periods
Logs are stored in buckets, and each bucket has a retention period that defines how long its logs are kept before being automatically deleted.
Symptoms:
Logs older than a certain date are missing.
Cause:
The logs that you're searching for are older than the retention period for the log bucket that they were routed to.
Resolution:
To identify and update the retention period, select one of the following options:
Console
Identify the bucket to which your GKE logs are routed to:
In the Google Cloud console, go to the Logs Router page.
Review the Destination column, which shows where the logs are being routed.
The destination looks similar to the following:
logging.googleapis.com/projects/PROJECT_ID/locations/LOCATION/buckets/BUCKET_IDNote the
PROJECT_ID,LOCATION, andBUCKET_ID.Logs are often routed to the
_Defaultbucket, but might also be routed to other buckets if you have custom sinks configured.
Check the log bucket retention period:
In the Google Cloud console, go to the Logs Storage page.
Find the buckets matching the
BUCKET_ID,LOCATION, andPROJECT_IDfrom the sink's destination.For each relevant bucket, view the Retention period column.
If the logs that you want to view are older than the retention period, then Cloud Logging has deleted them. If you need a longer retention period, do the following:
- For the bucket whose retention period you want to extend, click More actions.
- Select Edit bucket, and update the retention period. Be aware of potential cost implications.
gcloud
Identify the bucket to which your GKE logs are routed to:
gcloud logging sinks list --project=PROJECT_IDReview the output. The
destinationfield for each sink shows where the logs are being routed. The destination format for a log bucket is:logging.googleapis.com/projects/PROJECT_ID/locations/LOCATION/buckets/BUCKET_IDNote the
PROJECT_ID,LOCATION, andBUCKET_ID.Logs are often routed to the
_Defaultbucket.Check the log bucket retention period:
gcloud logging buckets describe BUCKET_ID \ --location=LOCATION \ --project=PROJECT_IDIn the output, look for the
retentionDaysfield. If the logs that you need are older than the value listed forretentionDays, then Cloud Logging has deleted them.If you need a longer retention period, update it:
gcloud logging buckets update BUCKET_ID \ --location=LOCATION \ --retention-days=RETENTION_DAYS \ --project=PROJECT_IDReplace the following:
BUCKET_ID: the ID of the log bucket.LOCATION: the Compute Engine region or zone (for example,us-central1orus-central1-a) for the bucket.RETENTION_DAYS: the number of days to retain logs. Be aware of potential cost implications for increasing the retention period.PROJECT_ID: your Google Cloud project ID.
Investigate Cloud Logging service issues and ingestion delays
Sometimes, the logging pipeline itself might experience issues, either from a service-wide disruption or a temporary, large-scale ingestion delay.
Symptoms:
- Widespread or intermittent log loss across multiple projects or clusters.
- Logs are significantly delayed in appearing in Logs Explorer.
Cause:
- Cloud Logging service disruption: a rare, service-wide disruption can prevent log ingestion, leading to widespread delays or total log loss.
- High log volume: even without an official disruption, high log volume from your project or region can temporarily overwhelm the ingestion service, causing logs to be delayed in appearing.
Resolution:
Check the status of Google Cloud services by visiting the Google Cloud Service Health dashboard. Look for any open incidents related to Cloud Logging or GKE.
Account for potential ingestion delays. If logs aren't immediately visible, and there are no active incidents, allow some time for ingestion, especially if the log volume is high. Check again after a few minutes.
Investigate cluster version
GKE regularly releases new versions that include bug fixes and performance improvements for components, including the logging agent.
Symptoms:
Logging issues coincide with cluster version limitations.
Cause:
The cluster might be running an older or unsupported GKE version that has known logging agent issues or lacks certain logging features.
Resolution:
To resolve this issue, do the following:
Check your cluster's version:
gcloud container clusters describe CLUSTER_NAME \ --location LOCATION \ --format="value(currentMasterVersion)"Replace the following:
CLUSTER_NAME: the name of your cluster.LOCATION: the Compute Engine region or zone (for example,us-central1orus-central1-a) for the cluster.
To ensure it's a supported version, compare this version against the GKE Release schedule.
If the cluster is using an unsupported version, upgrade your cluster.
What's next
If you can't find a solution to your problem in the documentation, see Get support for further help, including advice on the following topics:
- Opening a support case by contacting Cloud Customer Care.
- Getting support from the community by
asking questions on StackOverflow
and using the
google-kubernetes-enginetag to search for similar issues. You can also join the#kubernetes-engineSlack channel for more community support. - Opening bugs or feature requests by using the public issue tracker.