Google Distributed Cloud nettest identifies connectivity issues in the
Kubernetes objects in your clusters, such as Pods, Nodes, Services, and some
external targets. nettest doesn't check connections from external targets to
Pods, Nodes, or Services. This document describes how to deploy and run
nettest with one of the manifests, nettest.yaml or nettest_rhel.yaml, in
the
anthos-samples
GitHub repository. Use nettest_rhel.yaml if you run Google Distributed Cloud on
Red HatEnterprise Linux (RHEL). Use nettest.yaml if you run
Google Distributed Cloud on Ubuntu.
This document also describes how you interpret the logs generated by nettest
to identify connectivity problems with your clusters.
About nettest
The nettest diagnostic tool consists of the following Kubernetes objects. Each
object is specified in the nettest YAML manifest files.
cloudprober: a DaemonSet and a Service responsible for collecting network connection status, such as error rate and latency.echoserver: a DaemonSet and a Service responsible for responding tocloudprober, providing it the metrics for network connectivity.nettest: a Pod containing theprometheusandnettestcontainers.prometheuscollects metrics fromcloudprober.nettestqueriesprometheusand displays the network test results in the log.
nettest-engine: a ConfigMap to configure thenettestcontainer in thenettestPod.
The manifest also specifies the nettest namespace and a dedicated
ServiceAccount (along with ClusterRole and ClusterRoleBinding) to isolate
nettest from other cluster resources.
Run nettest
Deploy nettest by running the following command for your operating system.
When the nettest Pod starts, the test runs automatically. The test takes about
five minutes to complete.
For Ubuntu:
kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/anthos-samples/main/anthos-bm-utils/abm-nettest/nettest.yaml
For RHEL:
kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/anthos-samples/main/anthos-bm-utils/abm-nettest/nettest_rhel.yaml
Get the test results
After the test has completed, which should take around five minutes after the
nettest manifest is deployed, run the following command to see the nettest
results:
kubectl -n nettest logs nettest -c nettest
While nettest is running, it sends messages like the following to stdout:
I0413 03:33:04.879141 1 collectorui.go:130] Listening on ":8999"
I0413 03:33:04.879258 1 prometheus.go:172] Running prometheus controller
E0413 03:33:04.879628 1 prometheus.go:178] Prometheus controller: failed to
retries probers: Get "http://127.0.0.1:9090/api/v1/targets": dial tcp 127.0.0.1:9090:
connect: connection refused
If nettest runs successfully without identifying any connectivity failures,
you see the following log entry:
I0211 21:58:34.689290 1 validate_metrics.go:78] Metric validation passed!
If nettest found connection issues, it writes log entries like the following:
E0211 06:40:11.948634 1 collector.go:65] Engine error: step validateMetrics failed:
"Error rate in percentage": probe from "10.200.0.3" to "172.26.115.210:80" has value 100.000000,
threshold is 1.000000
"Error rate in percentage": probe from "10.200.0.3" to "172.26.27.229:80" has value 100.000000,
threshold is 1.000000
"Error rate in percentage": probe from "192.168.3.248" to "echoserver-hostnetwork_10.200.0.2_8080"
has value 2.007046, threshold is 1.000000
Although the default threshold is one percent (1.000000), error rates up to
five percent can be ignored safely. For example, the error rate for connectivity
from IP address 192.168.3.248 to echoserver-hostnetwork_10.200.0.2_8080 in
the preceding example is approximately two percent (2.007046). This is an
example of a reported connectivity issue that you can ignore.
Interpret the test results
When nettest finishes and finds a connectivity issue, you see the following
entry in the nettest Pod logs:
"Error rate in percentage": probe from {src} to {dst} has value 100.000000, threshold is 1.000000
Here, {src} and {dst} can be either:
echoserverPod IP: the connection to or from a Pod on the node.- Node IP: the connection to or from the node.
- Service IP (see the following text for details)
In addition, {dst} can also be:
google.com: an external connection.dns: the connection to a non-hostNetworkService through DNS, that isechoserver-non-hostnetwork.nettest.svc.cluster.local.The details for Service IP are in JSON-formatted probe entries in the log, like the following example. The following probe example shows that
172.26.27.229:80is the address forservice-clusterip. There are two probes with thistargetsvalue, one for the Pod (pod-service-clusterip) and one for the Node (node-service-clusterip).probe { name: "node-service-clusterip" … targets { host_names: "172.26.27.229:80" }
Validate your fixes
When have addressed all reported connectivity issues, remove the nettest Pod
and reapply the nettest manifest to rerun the tests for connectivity.
For example, to rerun nettest for Ubuntu, run the following commands:
kubectl -n nettest delete pod nettest
kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/anthos-samples/main/anthos-bm-utils/abm-nettest/nettest.yaml
Clean up nettest
When you're done testing, run the following commands to remove all nettest
resources:
kubectl delete namespace nettest
kubectl delete clusterroles nettest:nettest
kubectl delete clusterrolebindings nettest:nettest
What's next
If you need additional assistance, reach out to Cloud Customer Care. You can also see Getting support for more information about support resources, including the following:
- Requirements for opening a support case.
- Tools to help you troubleshoot, such as your environment configuration, logs, and metrics.
- Supported components.