Set up Multi-Cluster Mesh Failover

This page shows you how to design and implement a high-availability traffic routing strategy using Cloud Service Mesh in a multi-cluster environment. The following table describes the expected behavior:

Cluster State Traffic Behavior
Both clusters healthy 50% traffic to Cluster A, 50% to B
Cluster A becomes unavailable 100% traffic to Cluster B
Cluster A recovers Automatically restores 50/50 split

Prerequisites

As a starting point, this guide assumes that you have already:

This lab uses the following regions:

  • Cluster A: europe-west1
  • Cluster B: us-central1

Set up multi-cluster mesh failover

  1. Deploy and apply the public ingress gateway using the sample manifest from the Cloud Service Mesh repository:

    cat <<EOF> istio-ingressgateway.yaml
    apiVersion: networking.istio.io/v1alpha3
    kind: Gateway
    metadata:
      name: public-gateway
      namespace: default
    spec:
      selector:
        istio: ingressgateway
      servers:
      - port:
          number: 80
          name: http
          protocol: HTTP
        hosts:
        - '*'
    EOF
    
    kubectl apply -f istio-ingressgateway.yaml
    

    This gateway exposes the hello-world service externally.

  2. Create and apply a VirtualService in Cluster A to route traffic to the hello-world service:

    cat <<EOF> virtual-service.yaml
    apiVersion: networking.istio.io/v1beta1
    kind: VirtualService
    metadata:
      name: hello-world
      namespace: default
    spec:
      hosts:
      - '*'
      gateways:
      - public-gateway
      http:
      - route:
        - destination:
            host: hello-world.default.svc.cluster.local
    EOF
    
    kubectl apply -f virtual-service.yaml
    

    This configuration forwards HTTP requests from the gateway to the service.

  3. Configure and apply a DestinationRule for locality-based failover

    cat <<EOF> destination-rule.yaml
    apiVersion: networking.istio.io/v1alpha3
    kind: DestinationRule
    metadata:
      name: hello-world
      namespace: default
    spec:
      host: hello-world.default.svc.cluster.local
      trafficPolicy:
        connectionPool:
          http:
            http2MaxRequests: 100
        outlierDetection:
          consecutive5xxErrors: 1
          interval: 1s
          baseEjectionTime: 30s
          maxEjectionPercent: 100
        loadBalancer:
          localityLbSetting:
            enabled: true
            distribute:
            - from: europe-west1
              to:
                europe-west1: 50
                us-central1: 50
            - from: us-central1
              to:
                us-central1: 50
                europe-west1: 50
    EOF
    
    kubectl apply -f destination-rule.yaml
    

Note the following:

  • The localityLbSetting under the DestinationRule enables even traffic split and automatic failover.
  • maxEjectionPercent allows Istio to failover all traffic if every endpoint in a locality is unhealthy.
  • distribute: ensures an even 50/50 split between the clusters, based on the source cluster's region.
  • failover: is implicitly handled when one locality becomes unavailable — Istio routes 100% of traffic to the healthy region.
  • outlierDetection: ejects failing endpoints after minimal error thresholds.

Validate

You can now validate this behavior by:

  1. Sending requests through the Ingress Gateway in Cluster A.
  2. Scaling down hello-world pods in europe-west1 to 0.
  3. Observing traffic failover to us-central1.
  4. Scaling pods back up in europe-west1 and verifying traffic split resumes.