Resolving scaling issues in Cloud Service Mesh
This section explains common Cloud Service Mesh problems and how to resolve them. If you need additional assistance, see Getting support.
Scaling factors
Istiod sends configuration to each sidecar using a long-lived gRPC stream. It has several characteristics that affect scaling:
- The size of the configuration to generate:
- Total number of services/pods & Istio resources
- For large scale, adjust settings for the Sidecar to reduce the configuration size.
 
- The rate of change in the environment:
- When a new service is created or the Istio configuration is changed, full updates are sent to proxies.
- Adding new endpoints is inexpensive for performance, because only incremental updates are sent.
 
- The number of proxies for which configuration is generated:
- Affected by the number of gateways and pods with a sidecar.
 
Scaling considerations
Istiod scales well vertically (large requests) and horizontally (more replicas). Ensure that your CPU limits are not too restrictive; if Istiod reaches the CPU limit, throttling may occur which will negatively affect configuration distribution. If you encounter performance issues, consider upgrading to the latest version of Cloud Service Mesh, as each version has performance optimizations.
For more guidance on scaling your mesh, see the Scalability best practices guide.
Unbalanced load
Large changes in cluster size might cause a temporarily unbalanced load, due to
the long-lived connections. This is mitigated by a 30 minute maximum connection
age, which might result in error messages in Envoy, such as gRPC config stream
closed: 13, which allows the load to naturally rebalance.
Mitigate this issue by having multiple replicas of Istiod (the default is 2 replicas), and pre-scaling if you expect extreme cluster scale-ups.