Anthos Service Mesh and Traffic Director are now Cloud Service Mesh. For more information, see the Cloud Service Mesh overview.

Set up Multi-Cluster Mesh Failover

This page shows you how to design and implement a high-availability traffic routing strategy using Cloud Service Mesh in a multi-cluster environment. The following table describes the expected behavior:

Cluster State	Traffic Behavior
Both clusters healthy	50% traffic to Cluster A, 50% to B
Cluster A becomes unavailable	100% traffic to Cluster B
Cluster A recovers	Automatically restores 50/50 split

Prerequisites

As a starting point, this guide assumes that you have already:

Created two GKE clusters registered to the same fleet host project in two different regions configured for Cloud Service Mesh.
Set up a multi-cluster mesh on Cloud Service Mesh.
Istio control plane installed and configured in both clusters.
istio-ingressgateway deployed and exposed in at least one cluster (Cluster A).
hello-world application deployed in both clusters with sidecar injection enabled.

This lab uses the following regions:

Cluster A: europe-west1
Cluster B: us-central1

Set up multi-cluster mesh failover

Deploy and apply the public ingress gateway using the sample manifest from the Cloud Service Mesh repository:

cat <<EOF> istio-ingressgateway.yaml
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  name: public-gateway
  namespace: default
spec:
  selector:
    istio: ingressgateway
  servers:
  - port:
      number: 80
      name: http
      protocol: HTTP
    hosts:
    - '*'
EOF

kubectl apply -f istio-ingressgateway.yaml

This gateway exposes the hello-world service externally.

Create and apply a VirtualService in Cluster A to route traffic to the hello-world service:

cat <<EOF> virtual-service.yaml
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: hello-world
  namespace: default
spec:
  hosts:
  - '*'
  gateways:
  - public-gateway
  http:
  - route:
    - destination:
        host: hello-world.default.svc.cluster.local
EOF

kubectl apply -f virtual-service.yaml

This configuration forwards HTTP requests from the gateway to the service.

Configure and apply a DestinationRule for locality-based failover

cat <<EOF> destination-rule.yaml
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: hello-world
  namespace: default
spec:
  host: hello-world.default.svc.cluster.local
  trafficPolicy:
    connectionPool:
      http:
        http2MaxRequests: 100
    outlierDetection:
      consecutive5xxErrors: 1
      interval: 1s
      baseEjectionTime: 30s
      maxEjectionPercent: 100
    loadBalancer:
      localityLbSetting:
        enabled: true
        distribute:
        - from: europe-west1
          to:
            europe-west1: 50
            us-central1: 50
        - from: us-central1
          to:
            us-central1: 50
            europe-west1: 50
EOF

kubectl apply -f destination-rule.yaml

Note the following:

The localityLbSetting under the DestinationRule enables even traffic split and automatic failover.
maxEjectionPercent allows Istio to failover all traffic if every endpoint in a locality is unhealthy.
distribute: ensures an even 50/50 split between the clusters, based on the source cluster's region.
failover: is implicitly handled when one locality becomes unavailable — Istio routes 100% of traffic to the healthy region.
outlierDetection: ejects failing endpoints after minimal error thresholds.

Validate

You can now validate this behavior by:

Sending requests through the Ingress Gateway in Cluster A.
Scaling down hello-world pods in europe-west1 to 0.
Observing traffic failover to us-central1.
Scaling pods back up in europe-west1 and verifying traffic split resumes.