Hotfix 6
Networking:
Namespace termination prevents Managed Harbor Service project network policies from being ready, which blocks new Harbor instances.
An incorrect switch model configuration causes unhealthy management aggregation switches.
Hotfix 5
Database service:
- Unable to update or delete a database cluster that has the status
FailoverInProgress.
Monitoring:
- The Managed Harbor Service monitoring target takes a long time to load.
Hotfix 4
GDC console:
- Added performance improvements for the GDC console.
Object storage:
Addressed the 50,000 metric cardinality limit to keep metrics flowing by removing events other than the total.
Decreased the range read latency by removing unnecessary calls to StorageGRID and KMS.
Hotfix 3
Cloud DNS:
Addressed DNS issues related to managed DNS splits, incorrect handling of PKI certificates, and the inability to delete legacy Cloud DNS records.
Updated DNS SLOs so they rely on actual data across both org infrastructure and root admin clusters.
Increased the alert firing time for legacy DNS alerts to reduce alert frequency.
There are TLS scrape issues for DNS metrics.
Added the Istio service and FRE watcher for the global
DNSRegistrationresource.Addressed resource addition problems for IAM roles related to the infrastructure operator group.
IAM:
Increased memory and replicas for AIS, and added more granular operable parameters to adjust them per container.
During Grafana API queries, a 302 error redirect happens causing usability issues.
Virtual machines:
- GPU resources on a node can't be repopulated after a kubelet restart,
resulting in GPU VMs getting stuck in the
Schedulingstate.
Hotfix 2
Cluster:
- There's an NPC upscaling issue within a cluster when using an OS version annotation. This applies to both bare metal and VM nodes.
Database service:
- There are conflicts during specification updates related to the health check process for highly available PostgreSQL databases.
Google Distributed Cloud for bare metal:
- There are security issues with containerd related to CVE-2025-31133, CVE-2025-52565, and CVE-2025-52881.
- A cluster can become unresponsive during deletion, which is triggered by a busy perimeter when the cluster is reconciling.
- There's a recovery issue with the remote cluster watch.
- A cluster cache client can become unresponsive with GET and LIST operations.
- Node deletion causes a data leakage during cluster deprovisioning.
- A race condition occurs with a config map and secrets referenced in API server arguments.
Monitoring:
There are issues with exposing operable parameters for the tuning stack.
There are suboptimal CPU and memory limits related to the Cortex query stack and cache.
Added safeguards for cardinality and label counts.
Added monitoring diagnostic dashboards.
Added additional monitoring SLOs.
There are issues with the
AuditLogInventoryresource.
Object storage:
There's a missing runbook for a
BucketLocationConfigpermission issue.The default replica value for the syslog-server is too low.
Added S3 permissions to fetch various bucket metadata attributes.
Added a claim-by-force annotation to the
obj-system/allow-obs-system-ingress-trafficNetworkPolicyresource.
Operations Suite Infrastructure (OI):
- Added concurrency and retry logic to the knowledge base sync.
Operating system:
There are unexpected preflight job deletions during the OS controller restart.
Stale inventory machines and
OSPolicyresource deletion are not handled correctly.Added metrics, dashboard, and events for OS policy controllers.
The
OSPolicyReconcilerevent can degrade over time.The preflight job deadline and retry are not configurable, and don't persist.
A node target policy race condition occurs that reduces the API load.
Added a reason for
OSPolicyjob creation events.Modified predefined roles for OS monitoring and debugging.
Resource Manager:
- Unable to create an organization when using a hotpatch version.
Ticketing system:
- Inaccurate firewall signature update steps in the runbook SECOps-P0024.
Upgrade:
- Unable to complete a policy-based Google Distributed Cloud for bare metal upgrade when using a hotpatch version.
Vertex AI:
- Enabled any-to-any language translation. Previously, languages could only be translated to English or German.
Virtual machines:
The VM metadata server certificate isn't removed during VM deletion.
There is an issue when cloning a VM disk.
Added extra validation during a VM image import to check if the import process succeeds.
The VM external access was showing the egress IP address for Cloud NAT-enabled projects.
Hotfix 1
Backup and restore:
- When a backup job fails due to a NetApp ONTAP caching issue, all subsequent backup jobs fail.
Database service:
An instance of a highly available PostgreSQL database cluster could be incorrectly deleted.
Specification updates for highly available PostgreSQL databases causes conflicts.
Identity and access management:
- There are conflicting IAM and DNS role names.
Vertex AI:
Long-running Vertex AI Translation operation APIs return a 404 error.
When deploying an Online Prediction model, there's a missing Vertex AI Prediction Developer role.