This page describes how to configure bundled load balancing with MetalLB for Google Distributed Cloud. MetalLB load balancers run on either a dedicated pool of worker nodes or on the same nodes as the control plane.
See Overview of load balancers for examples of load-balancing topologies available in Google Distributed Cloud.
Requirements
- All load balancer nodes must be in the same Layer 2 subnet.
- All VIPs must be in the load balancer nodes subnet and be routable through the subnet's gateway.
- The gateway of the load balancer subnet must listen to gratuitous ARP messages and forward ARP packets to the load balancer nodes.
Configuration fields
Edit the cluster.spec.loadBalancer section of the cluster configuration file
to configure bundled load balancing. For information about cluster
configuration files and examples of valid configurations, see one of the
following pages:
loadBalancer.mode
This value must be bundled to enable bundled load balancing.
loadBalancer.ports.controlPlaneLBPort
This value specifies the destination port to be used for traffic sent to the Kubernetes control plane (the Kubernetes API servers).
loadBalancer.vips.controlPlaneVIP
This value specifies the destination IP address to be used for traffic sent to
the Kubernetes control plane (the Kubernetes API servers). This IP address must
be in the same Layer 2 subnet as the nodes in the cluster. Don't list this
address in the address pools section of the configuration
file.
loadBalancer.vips.ingressVIP
This value specifies the IP address to be used for Services behind the load balancer for ingress traffic. This field is not allowed in admin cluster configuration files. This address must be listed in the address pools section of the configuration.
loadBalancer.addressPools
This section of the configuration contains one or more address pools. Each
address pool specifies a list of IP address ranges. When you create a
Service of type LoadBalancer,
the external IP addresses for the Service are chosen from these ranges.
Address pools are specified in the following format:
- name: POOL_NAME
  avoidBuggyIPs: BOOLEAN
  manualAssign: BOOLEAN
  addresses:
  - IP_RANGE
  - IP_RANGE2
- name: The name of the address pool, pool-name, for your own organizational purposes. This field is immutable.
- avoidBuggyIPs: (Optional)- trueor- false. If- true, the pool omits IP addresses ending in- .0and- .255. Some network hardware drops traffic to these special addresses. You can omit this field, its default value is- false. This field is mutable.
- manualAssign: (Optional)- trueor- false. If- true, addresses in this pool are not automatically assigned to Kubernetes Services. If- true, an IP address in this pool is used only when it is specified explicitly by a service. You can omit this field, its default value is- false. This field is mutable.
- addressesA list of one or more nonoverlapping IP address ranges. ip-range can be specified in either CIDR notation (like- 198.51.100.0/24) or range notation (like- 198.51.100.0-198.51.100.10, with no spaces around the dash). This field is immutable.
The IP address ranges in the addresses list must not overlap and must be in
the same subnet as the nodes running load balancers.
loadBalancer.nodePoolSpec
This section of the configuration specifies a list of nodes to run load balancers on. Load balancer nodes can run regular workloads by default; there is no special taint on those nodes. Although nodes in the load balancer node pool can run workloads, they're separate from the nodes in the worker node pools. You can't include a given cluster node in more than one node pool. Overlapping node IP addresses between node pools block cluster creation and other cluster operations.
If you want to prevent workloads from running on a node in the load balancer node pool, add the following taint to the node:
node-role.kubernetes.io/load-balancer:NoSchedule
Google Distributed Cloud adds tolerations for this taint to the pods that are required for load balancing.
The following example shows a load balancing node pool with two nodes. The first
node has a standard IP address nodePoolSpec.nodes.address ('1.2.3.4') and a
Kubernetes IP address nodePoolSpec.nodes.k8sIP (10.0.0.32). When you specify
the optional k8sIP address for a node, it's dedicated to handling data traffic
for the node, such as requests and responses for the Kubernetes API, the
kubelet, and workloads. In this case, the standard IP address
nodePoolSpec.nodes.address is used for SSH connections to the node for
administrative cluster operations. If you don't specify a k8sIP address, the
standard node IP address handles all traffic for the node.
nodePoolSpec:
  nodes:
  - address: 1.2.3.4
    k8sIP: 10.0.0.32
  - address: 10.0.0.33
By default, all nodes in the load balancer node pool must be in the same Layer 2
subnet as the load balancer VIPs configured in the
loadBalancer.addressPools section of the configuration file.
However, if you specify a Kubernetes IP address k8sIP for a node, only that
address needs to be in the same Layer 2 subnet as the other load balancer VIPs.
If nodePoolSpec isn't set, the bundled load balancers run on the control
plane nodes. We recommend you run load balancers on separate node pools if
possible.
Control plane load balancing
The control plane load balancer serves the control plane virtual IP address (VIP). Google Distributed Cloud runs Keepalived and HAProxy as Kubernetes static pods on the load-balancer nodes to announce the control plane VIP. Keepalived uses the Virtual Router Redundancy Protocol (VRRP) on the load balancer nodes for high availability.
Data plane load balancing
The data plane load balancer is for all Kubernetes Services of type
LoadBalancer.
Google Distributed Cloud uses MetalLB
running in Layer 2 mode for data plane load balancing. Data plane load balancing
can only be configured through Google Distributed Cloud, don't modify the MetalLB
ConfigMap directly. You can use all MetalLB features including
IP address sharing across Services.
See the MetalLB documentation for feature information.
MetalLB runs a speaker Pod on each node using a daemonset, using memberlist for high availability. There is a MetalLB dedicated load balancer node for each Kubernetes Service, rather than one for the entire cluster. This way traffic is distributed across load balancer nodes if there are multiple Services.
The data plane load balancers can run on either the control plane nodes or on a subset of worker nodes. Bundling data plane load balancers on the control plane nodes increases utilization of the control plane nodes. Additionally, bundling on the control plane nodes also increases the risk of overloading the control plane and increases the risk profile of confidential information on the control plane, such as SSH keys.
Load balancer separation
Prior to release 1.32, when you configure Layer 2 load balancing with MetalLB, the control plane load balancers and the data plane load balancers run on the same nodes. Depending on your configuration, the load balancers all run on the control plane nodes or they all run in the load balancer node pool.
The following diagram shows the default bundled load balancer configuration with both control plane and data plane load balancers running on control plane nodes or both running in the load balancer node pool:
With version 1.32 clusters, you can configure the control plane load balancers to run on the control plane nodes and the data plane load balancers to run in the load balancer node pool. You can specify this separation of load balancers when you create a new version 1.32 cluster, or you can update a version 1.32 cluster to migrate the data plane load balancers from the control plane nodes to the load balancer node pool.
The cluster configuration for separated load balancers should look similar to the following example:
apiVersion: baremetal.cluster.gke.io/v1
kind: Cluster
metadata:
  name: hybrid-ha-lb
  namespace: cluster-hybrid-ha-lb
spec:
  type: hybrid
  profile: default
  anthosBareMetalVersion: 1.33
  gkeConnect:
    projectID: project-fleet
  controlPlane:
    loadBalancer:
      mode: bundled
    nodePoolSpec:
      nodes:
      - address: 10.200.0.2
      - address: 10.200.0.3
      - address: 10.200.0.4
  clusterNetwork:
    pods:
      cidrBlocks:
      - 192.168.0.0/16
    services:
      cidrBlocks:
      - 10.96.0.0/20
  ...
  loadBalancer:
    mode: bundled
    ...
    nodePoolSpec:
      nodes:
      - address: 10.200.0.5
      - address: 10.200.0.6
      - address: 10.200.0.7
  clusterOperations:
  ...
Separate load balancers when creating a cluster
If you're creating a new version 1.32 or higher cluster, you can configure the load balancers to run the control plane load balancers on the control plane nodes and the data plane load balancers on the load balancer node pool.
The following diagram shows the control plane and data plane load balancers separated on to different nodes:
To separate the load balancers when you create a cluster, use the following steps:
- In the cluster configuration file, specify a load balancer node pool with - loadBalancer.nodePoolSpecas described in the- loadBalancer.nodePoolSpecsection of this document.
- Add - controlPlane.loadBalancer.modeto the cluster configuration file and set the- modevalue to- bundled.
- Finish configuring your cluster and run - bmctl create clusterto create the cluster.
Migrate data plane load balancers off of the control plane
If you have an existing version 1.32 or higher cluster where neither
controlPlane.loadBalancer.mode nor loadBalancer.nodePoolSpec is set, both
the control plane load balancer and the data plane load balancer run in the
control plane node pool. You can update the cluster to migrate the data plane
load balancer to a load balancer node pool.
The following diagram shows the control plane and data plane load balancers separated after the data plane load balancer has been migrated off of the control plane nodes:
To migrate the data plane load balancer to a load balancer node pool when you update a cluster, use the following steps:
- In the cluster configuration file, specify a load balancer node pool with - loadBalancer.nodePoolSpecas described in the- loadBalancer.nodePoolSpecsection of this document.
- Add - controlPlane.loadBalancer.modeto the cluster configuration file and set the- modevalue to- bundled.
- Update the cluster by running the following command: - bmctl update cluster -c CLUSTER_NAME --kubeconfig=ADMIN_KUBECONFIG- Replace the following: - CLUSTER_NAME: the name of the cluster you're updating.
- ADMIN_KUBECONFIG: the path of the admin cluster kubeconfig file.
 
Preserving client source IP address
The LoadBalancer Service created with the bundled Layer 2 load balancing
solution uses the default Cluster setting for the external traffic policy.
This setting, spec.externalTrafficPolicy: Cluster, routes external traffic to
cluster-wide endpoints, but it also obscures the client source IP address.
Google Distributed Cloud supports two methods for preserving the client source IP address:
- Set the forwarding mode for load balancing to Direct Server Return (DSR). For more information about DSR forwarding mode, including including instructions to enable it, see Configure load balancing forwarding mode. 
- Set the external traffic policy to local for the - LoadBalancerService and configure related Services and Ingress accordingly. The following sections describe how to configure your cluster to use this method.
LoadBalancer Services
When using externalTrafficPolicy: Local in your LoadBalancer Services, set
your application pods to run exactly on the load balancer nodes. Add the
following nodeSelector to your application pods to make this change:
apiVersion: v1
kind: Pod
...
spec:
  nodeSelector:
      baremetal.cluster.gke.io/lbnode: "true"
...
NodePort Services
Kubernetes does source network address translation (SNAT) for NodePort
Services. To retain the client source IP addresses, set
service.spec.externalTrafficPolicy to Local. Kubernetes won't perform SNAT
anymore, but you must make sure there are pods running exactly on the node IP
you picked.
Ingress
If your applications are HTTP services, you can achieve client IP visibility by configuring ingress components:
- Open the - istio-ingressService for editing:- kubectl edit service -n gke-system istio-ingress
- Add - externalTrafficPolicy: Localto the- spec, save and exit the editor.- apiVersion: v1 kind: Service ... spec: ... externalTrafficPolicy: Local
- Open the - istio-ingressDeployment for editing:- kubectl edit deployment -n gke-system istio-ingress
- Add the following - nodeSelectorto the Deployment, save and exit the editor.- apiVersion: apps/v1 kind: Deployment ... spec: ... template: ... spec: ... nodeSelector: baremetal.cluster.gke.io/lbnode: "true" ...
Now, all of your services behind Ingress see a X-Forwarded-For header with the
client IP, like the following example:
X-Forwarded-For: 21.0.104.4