This page describes the specialized Network Function Kubernetes operator that Google Distributed Cloud ships with. This operator implements a set of CustomResourceDefinitions (CRDs) that allow Distributed Cloud to execute high-performance workloads.
Network Function operator and SR-IOV functionality is not available on Distributed Cloud Servers.
The Network Function operator lets you do the following:
- Poll for existing network devices on a node.
- Query the IP address and physical link state for each network device on a node.
- Provision additional network interfaces on a node.
- Configure low-level system features on the node's physical machine required to support high-performance workloads.
- Use single-root input/output virtualization (SR-IOV) on PCI Express network interfaces to virtualize them into multiple virtual interfaces. You can then configure your Distributed Cloud workloads to use those virtual network interfaces.
Distributed Cloud's support for SR-IOV is based on the following open source projects:
Prerequisites
The Network Function operator fetches network configuration from the Distributed Cloud Edge Network API.
To allow this, you must grant the Network Function operator service account the Edge Network Viewer role
(roles/edgenetwork.viewer) using the following command:
gcloud projects add-iam-policy-binding PROJECT_ID \ --role roles/edgenetwork.viewer \ --member "serviceAccount:PROJECT_ID.svc.id.goog[nf-operator/nf-angautomator-sa]"
Replace PROJECT_ID with the ID of the target Google Cloud project.
Network Function operator resources
The Distributed Cloud Network Function operator implements the following Kubernetes CRDs:
- Network. Defines a virtual network that Pods can use to communicate with internal and external resources. You must create the corresponding VLAN using the Distributed Cloud Edge Network API before specifying it in this resource. For instructions, see Create a network.
- NetworkInterfaceState. Enables the discovery of network interface states and querying a network interface for link state and IP address.
- NodeSystemConfigUpdate. Enables the configuration of low-level system features such as kernel options and- Kubeletflags.
- SriovNetworkNodePolicy. Selects a group of SR-IOV virtualized network interfaces and instantiates the group as a Kubernetes resource. You can use this resource in a- NetworkAttachmentDefinitionresource.
- SriovNetworkNodeState. Lets you query the provisioning state of the- SriovNetworkNodePolicyresource on a Distributed Cloud node.
- NetworkAttachmentDefinition. Lets you attach Distributed Cloud Pods to one or more logical or physical networks on your Distributed Cloud node. You must create the corresponding VLAN before specifying it in this resource. You must create the corresponding VLAN using the Distributed Cloud Edge Network API before specifying it in this resource. For instructions, see Create a network.
The Network Function operator also lets you define secondary network interfaces that do not use SR-IOV virtual functions.
Network resource
  
The Network resource defines a virtual network within the
Distributed Cloud rack that Pods within your
Distributed Cloud cluster can use to communicate with
internal and external resources.
The Network resource provides the following configurable parameters for the
network interface exposed as writable fields:
- spec.type: specifies the network transport layer for this network. The only valid value is- L2. You must also specify a- nodeInterfaceMatcher.interfaceNamevalue.
- spec.nodeInterfaceMatcher.interfaceName: the name of the physical network interface on the target Distributed Cloud node to use with this network.
- spec.gateway4: the IP address of the network gateway for this network.
- spec.l2NetworkConfig.prefixLength4: specifies the CIDR range for this network.
The following example illustrates the structure of the resource:
apiVersion: networking.gke.io/v1
kind: Network
metadata:
  name: vlan200-network
  annotations:
    networking.gke.io/gdce-vlan-id: 200
    networking.gke.io/gdce-vlan-mtu: 1500
spec:
  type: L2
  nodeInterfaceMatcher:
    interfaceName: gdcenet0.200
  gateway4: 10.53.0.1
NetworkInterfaceState resource
The NetworkInterfaceState resource is a read-only resource that lets you
discover physical network interfaces on the node and collect runtime statistics
on the network traffic flowing through those interfaces.
Distributed Cloud creates a NetworkInterfaceState resource
for each node in a cluster.
The default configuration of Distributed Cloud machines
includes a bonded network interface on the Rack Select Network Daughter Card
(rNDC) named gdcenet0. This interface bonds the eno1np0 and eno2np1
network interfaces. Each of those is connected to one
Distributed Cloud ToR switch, respectively.
The NetworkInterfaceState resource provides the following categories of
network interface information exposed as read-only status fields.
General information:
- status.interfaces.ifname: the name of the target network interface.
- status.lastReportTime: the time and date of the last status report for the target interface.
IP address configuration information:
- status.interfaces.interfaceinfo.address: the IP address assigned to the target interface.
- status.interfaces.interfaceinfo.dns: the IP address of the DNS server assigned to the target interface.
- status.interfaces.interfaceinfo.gateway: the IP address of the network gateway serving the target interface.
- status.interfaces.interfaceinfo.prefixlen: the length of the IP prefix.
Hardware information:
- status.interfaces.linkinfo.broadcast: the broadcast MAC address of the target interface.
- status.interfaces.linkinfo.businfo: the PCIe device path in- bus:slot.functionformat.
- status.interfaces.linkinfo.flags: the interface flags—for example,- BROADCAST.
- status.interfaces.linkinfo.macAddress: the Unicast MAC address of the target interface.
- status.interfaces.linkinfo.mtu: the MTU value for the target interface.
Reception statistics:
- status.interfaces.statistics.rx.bytes: the total bytes received by the target interface.
- status.interfaces.statistics.rx.dropped: the total packets dropped by the target interface.
- status.interfaces.statistics.rx.errors: the total packet receive errors for the target interface.
- status.interfaces.statistics.rx.multicast: the total multicast packets received by the target interface.
- status.interfaces.statistics.rx.overErrors: the total packet receive over errors for the target interface.
- status.interfaces.statistics.rx.packets: the total packets received by the target interface.
Transmission statistics:
- status.interfaces.statistics.tx.bytes: the total bytes transmitted by the target interface.
- status.interfaces.statistics.tx.carrierErrors: the total carrier errors encountered by the target interface.
- status.interfaces.statistics.tx.collisions: the total packet collisions encountered by the target interface.
- status.interfaces.statistics.tx.dropped: the total packets dropped by the target interface.
- status.interfaces.statistics.tx.errors: the total transmission errors for the target interface.
- status.interfaces.statistics.tx.packets: the total packets transmitted by the target interface.
The following example illustrates the structure of the resource:
apiVersion: networking.gke.io/v1
kind: NetworkInterfaceState
metadata:
  name: MyNode1
nodeName: MyNode1
status:
  interfaces:
  - ifname: eno1np0
    linkinfo:
      businfo: 0000:1a:00.0
      flags: up|broadcast|multicast
      macAddress: ba:16:03:9e:9c:87
      mtu: 9000
    statistics:
      rx:
        bytes: 1098522811
        errors: 2
        multicast: 190926
        packets: 4988200
      tx:
        bytes: 62157709961
        packets: 169847139
  - ifname: eno2np1
    linkinfo:
      businfo: 0000:1a:00.1
      flags: up|broadcast|multicast
      macAddress: ba:16:03:9e:9c:87
      mtu: 9000
    statistics:
      rx:
        bytes: 33061895405
        multicast: 110203
        packets: 110447356
      tx:
        bytes: 2370516278
        packets: 11324730
  - ifname: enp95s0f0np0
    interfaceinfo:
    - address: fe80::63f:72ff:fec4:2bf4
      prefixlen: 64
    linkinfo:
      businfo: 0000:5f:00.0
      flags: up|broadcast|multicast
      macAddress: 04:3f:72:c4:2b:f4
      mtu: 9000
    statistics:
      rx:
        bytes: 37858381
        multicast: 205645
        packets: 205645
      tx:
        bytes: 1207334
        packets: 6542
  - ifname: enp95s0f1np1
    interfaceinfo:
    - address: fe80::63f:72ff:fec4:2bf5
      prefixlen: 64
    linkinfo:
      businfo: 0000:5f:00.1
      flags: up|broadcast|multicast
      macAddress: 04:3f:72:c4:2b:f5
      mtu: 9000
    statistics:
      rx:
        bytes: 37852406
        multicast: 205607
        packets: 205607
      tx:
        bytes: 1207872
        packets: 6545
  - ifname: enp134s0f0np0
    interfaceinfo:
    - address: fe80::63f:72ff:fec4:2b6c
      prefixlen: 64
    linkinfo:
      businfo: 0000:86:00.0
      flags: up|broadcast|multicast
      macAddress: 04:3f:72:c4:2b:6c
      mtu: 9000
    statistics:
      rx:
        bytes: 37988773
        multicast: 205584
        packets: 205584
      tx:
        bytes: 1212385
        packets: 6546
  - ifname: enp134s0f1np1
    interfaceinfo:
    - address: fe80::63f:72ff:fec4:2b6d
      prefixlen: 64
    linkinfo:
      businfo: 0000:86:00.1
      flags: up|broadcast|multicast
      macAddress: 04:3f:72:c4:2b:6d
      mtu: 9000
    statistics:
      rx:
        bytes: 37980702
        multicast: 205548
        packets: 205548
      tx:
        bytes: 1212297
        packets: 6548
  - ifname: gdcenet0
    interfaceinfo:
    - address: 208.117.254.36
      prefixlen: 28
    - address: fe80::b816:3ff:fe9e:9c87
      prefixlen: 64
    linkinfo:
      flags: up|broadcast|multicast
      macAddress: ba:16:03:9e:9c:87
      mtu: 9000
    statistics:
      rx:
        bytes: 34160422968
        errors: 2
        multicast: 301129
        packets: 115435591
      tx:
        bytes: 64528301111
        packets: 181171964
     .. <remaining interfaces omitted>
   lastReportTime: "2022-03-30T07:35:44Z"
NodeSystemConfigUpdate resource
The NodeSystemConfigUpdate resource lets you make changes to the node's
operating system configuration as well as modify Kubelet flags. Changes other
than sysctl changes require a node reboot.
When instantiating this resource, you must specify the target nodes in
the nodeSelector field. You must include all key-value pairs for each
target node in the nodeSelector field. When you specify more than one target
node in this field, the target nodes are updated one node at a time. This field
supersedes the nodeName field.
CAUTION: The nodeName field has been deprecated. Using it immediately reboots
the target nodes, including local control plane nodes, which can halt critical
workloads.
The NodeSystemConfigUpdate resource provides the following configuration fields
specific to Distributed Cloud:
- spec.containerRuntimeDNSConfig.ip: specifies a list of IP addresses for private image registries.
- spec.containerRuntimeDNSConfig: specifies a list of custom DNS entries used by the Container Runtime Environment on each Distributed Cloud node. Each entry consists of the following fields:- ip: specifies the target IPv4 address,
- domain: specifies the corresponding domain,
- interface: specifies the network egress interface through which the IP address specified in the- ipfield is reachable. You can specify an interface defined through the following resources:- CustomNetworkInterfaceConfig,- Network(by annotation),- NetworkAttachmentDefinition, (by annotation). This is a preview-level feature.
 
- spec.kubeletConfig.cpuManagerPolicy: specifies the Kubernetes CPUManager policy. Valid values are- Noneand- Static.
- spec.kubeletConfig.topologyManagerPolicy: specifies the Kubernetes TopologyManager policy. Valid values are- None,- BestEffort,- Restricted, and- SingleNumaMode.
- spec.osConfig.hugePagesConfig: specifies the huge page configuration per NUMA node. Valid values are- 2MBand- 1GB. The number of huge pages requested is evenly distributed across both NUMA nodes in the system. For example, if you allocate 16 huge pages at 1 GB each, then each node receives a pre-allocation of 8 GB.
- spec.osConfig.isolatedCpusPerSocket: specifies the number of isolated CPUs per socket. Required if- cpuManagerPolicyis set to- Static.
- spec.osConfig.cpuIsolationPolicy: specifies the CPU isolation policy. The- Defaultpolicy only isolates- systemdtasks from CPUs reserved for workloads. The- Kernelpolicy marks the CPUs as- isolcpusand sets the- rcu_nocb,- nohz_full, and- rcu_nocb_pollflags on each CPU.
- spec.sysctls.NodeLevel: specifies the- sysctlsparameters that you can configure globally on a node by using the Network Function operator. The configurable parameters are as follows:- fs.inotify.max_user_instances
- fs.inotify.max_user_watches
- kernel.sched_rt_runtime_us
- kernel.core_pattern
- net.ipv4.tcp_wmem
- net.ipv4.tcp_rmem
- net.ipv4.tcp_slow_start_after_idle
- net.ipv4.udp_rmem_min
- net.ipv4.udp_wmem_min
- net.ipv4.tcp_rmem
- net.ipv4.tcp_wmem
- net.core.rmem_max
- net.core.wmem_max
- net.core.rmem_default
- net.core.wmem_default
- net.netfilter.nf_conntrack_tcp_timeout_unacknowledged
- net.netfilter.nf_conntrack_tcp_timeout_max_retrans
- net.sctp.auth_enable
- net.sctp.sctp_mem
- net.ipv4.udp_mem
- net.ipv4.tcp_mem
- net.ipv4.tcp_slow_start_after_idle
- net.sctp.auth_enable
- vm.max_map_count
 - You can also scope both safe and unsafe - sysctlsparameters to a specific Pod or namespace by using the- tuningContainer Networking Interface (CNI) plug-in.
The NodeSystemConfigUpdate resource provides the following read-only general
status fields:
- status.lastReportTime: the most recent time that status was reported for the target interface.
- status.conditions.lastTransitionTime: the most recent time that the condition of the interface has changed.
- status.conditions.observedGeneration: denotes the- .metadata.generationvalue on which the initial condition was based.
- status.conditions.message: an informative message describing the change of the interface's condition.
- status.conditions.reason: a programmatic identifier denoting the reason for the last change of the interface's condition.
- status.conditions.status: the status descriptor of the condition. Valid values are- True,- False, and- Unknown.
- status.conditions.type: the condition type in camelCase.
The following example illustrates the structure of the resource:
apiVersion: networking.gke.io/v1
kind: NodeSystemConfigUpdate
metadata:
  name: node-pool-1-config
  namespace: default
spec:
  nodeSelector:
    baremetal.cluster.gke.io/node-pool: node-pool-1
    networking.gke.io/worker-network-sriov.capable: true
  sysctls:
    nodeLevel:
      "net.ipv4.udp_mem" : "12348035 16464042 24696060"
  kubeletConfig:
    topologyManagerPolicy: BestEffort
    cpuManagerPolicy: Static
  osConfig:
    hugePagesConfig:
      "TWO_MB": 0
      "ONE_GB": 16
    isolatedCpusPerSocket:
      "0": 10
      "1": 10
SriovNetworkNodePolicy resource
The SriovNetworkNodePolicy resource lets you allocate a group of SR-IOV
virtual functions (VFs) on a Distributed Cloud physical
machine and instantiate that group as a Kubernetes resource. You can then use
this resource in a NetworkAttachmentDefinition resource.
You can select each target VF by its PCIe vendor and device ID, its PCIe device addresses, or by its Linux enumerated device name. The SR-IOV Network Operator configures each physical network interface to provision the target VFs. This includes updating the network interface firmware, configuring the Linux kernel driver, and rebooting the Distributed Cloud machine, if necessary.
To discover the network interfaces available on your node, you can look up the
NetworkInterfaceState resources
on that node in the nf-operator namespace.
The following example illustrates the structure of the resource:
apiVersion: sriovnetwork.k8s.cni.cncf.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: mlnx6-p2-sriov-en2
  namespace: sriov-network-operator
spec:
  deviceType: netdevice
  isRdma: true
  mtu: 9000
  nicSelector:
    pfNames:
    - enp134s0f1np1
  nodeSelector:
    edgecontainer.googleapis.com/network-sriov.capable: "true"
  numVfs: 31
  priority: 99
  resourceName: mlnx6_p2_sriov_en2
The preceding example creates a maximum of 31 VFs from the second port on the
network interface named enp134s0f1np1 with an MTU value of 9000 (the maximum
allowed value). Use the node selector label
edgecontainer.googleapis.com/network-sriov.capable, which is present on all
Distributed Cloud nodes capable of SR-IOV.
For information about using this resource, see
SriovNetworkNodeState.
SriovNetworkNodeState resource
The SriovNetworkNodeState read-only resource lets you query the provisioning
state of the SriovNetworkNodePolicy resource on a
Distributed Cloud node. It returns the complete
configuration of the SriovNetworkNodePolicy resource on the node
as well as a list of active VFs on the node. The status.syncStatus field
indicates whether all SriovNetworkNodePolicy resources defined for the node
have been properly applied.
The following example illustrates the structure of the resource:
apiVersion: sriovnetwork.k8s.cni.cncf.io/v1
kind: SriovNetworkNodeState
metadata:
  name: MyNode1
  namespace: sriov-network-operator
spec:
  dpConfigVersion: "1969684"
  interfaces:
  - mtu: 9000
    name: enp134s0f1np1
    numVfs: 31
    pciAddress: 0000:86:00.1
    vfGroups:
    - deviceType: netdevice
      mtu: 9000
      policyName: mlnx6-p2-sriov-en2
      resourceName: mlnx6_p2_sriov_en2
      vfRange: 0-30
status:
Status:
  Interfaces:
    Device ID:    1015
    Driver:       mlx5_core
    Link Speed:   25000 Mb/s
    Link Type:    ETH
    Mac:          ba:16:03:9e:9c:87
    Mtu:          9000
    Name:         eno1np0
    Pci Address:  0000:1a:00.0
    Vendor:       15b3
    Device ID:    1015
    Driver:       mlx5_core
    Link Speed:   25000 Mb/s
    Link Type:    ETH
    Mac:          ba:16:03:9e:9c:87
    Mtu:          9000
    Name:         eno2np1
    Pci Address:  0000:1a:00.1
    Vendor:       15b3
    Vfs:
  - Vfs:
    - deviceID: 101e
      driver: mlx5_core
      mac: c2:80:29:b5:63:55
      mtu: 9000
      name: enp134s0f1v0
      pciAddress: 0000:86:04.1
      vendor: 15b3
      vfID: 0
    - deviceID: 101e
      driver: mlx5_core
      mac: 7e:36:0c:82:d4:20
      mtu: 9000
      name: enp134s0f1v1
      pciAddress: 0000:86:04.2
      vendor: 15b3
      vfID: 1
      .. <omitted 29 other VFs here>
  syncStatus: Succeeded
For information about using this resource, see
SriovNetworkNodeState.
NetworkAttachmentDefinition resource
The NetworkAttachmentDefinition resource lets you attach
Distributed Cloud Pods to one or more logical or physical
networks on your Distributed Cloud node. It leverages
the Multus-CNI framework and the
SRIOV-CNI plugin.
Use an annotation to reference the name of the appropriate
SriovNetworkNodePolicy resource. When you create this annotation, do the
following:
- Use the key k8s.v1.cni.cncf.io/resourceName.
- Use the prefix gke.io/in its value, followed by the name of the targetSriovNetworkNodePolicyresource.
The following example illustrates the structure of the resource:
apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: sriov-net1
  namespace: mynamespace
  annotations:
    k8s.v1.cni.cncf.io/resourceName: gke.io/mlnx6_p2_sriov_en2
spec:
  config: '{
  "type": "sriov",
  "cniVersion": "0.3.1",
  "name": "sriov-network",
  "ipam": {
    "type": "host-local",
    "subnet": "10.56.217.0/24",
    "routes": [{
      "dst": "0.0.0.0/0"
    }],
    "gateway": "10.56.217.1"
  }
}'
Upgrade NetworkAttachmentDefinition resources to Distributed Cloud 1.4.0
Distributed Cloud version 1.4.0 replaces the bond0
interface with a new interface named gdcenet0. The gdcenet0 interface lets
you use the host management network interface card (NIC) in each
Distributed Cloud machine in your rack for your workloads
while keeping the Distributed Cloud management and control
plane network traffic completely separated. To take advantage of this
functionality, complete the steps in this section to reconfigure your
NetworkAttachmentDefinition resources, and then follow the instructions in 
Configure Distributed Cloud networking
to provision the appropriate networks and subnetworks.
For each Distributed Cloud cluster on which you have
deployed one or more NetworkAttachmentDefinition resources,
the following migration rules apply:
- For each new NetworkAttachmentDefinitionresource, usegdcenet0instead ofbond0as the value of themasterfield. If you apply a resource that usesbond0or an empty value for this field, Distributed Cloud replaces the value withgdcenet0, and then stores and applies the resource to the cluster.
- For each existing NetworkAttachmentDefinitionresource, replacebond0withgdcenet0as the value of themasterfield, and then re-apply the resource to the cluster to restore full network connectivity to the affected Pods.
For information about using this resource, see
NetworkAttachmentDefinition.
Configure a secondary interface on a Pod using SR-IOV VFs
After you configure a SriovNetworkNodePolicy resource and a corresponding
NetworkAttachmentDefinition resource, you can configure a secondary network
interface on a Distributed Cloud Pod by using SR-IOV virtual
functions.
To do so, add an annotation to your Distributed Cloud Pod definition as follows:
- Key: k8s.v1.cni.cncf.io/networks
- Value: nameSpace/<NetworkAttachmentDefinition1,nameSpace/NetworkAttachmentDefinition2...
The following example illustrates this annotation:
apiVersion: v1
kind: Pod
metadata:
  name: sriovpod
  annotations:
    k8s.v1.cni.cncf.io/networks: mynamespace/sriov-net1
spec:
  containers:
  - name: sleeppodsriov
    command: ["sh", "-c", "trap : TERM INT; sleep infinity & wait"]
    image: alpine
    securityContext:
      capabilities:
        add:
          - NET_ADMIN
Configure a secondary interface on a Pod using the MacVLAN driver
Distributed Cloud also supports creating a secondary network
interface on a Pod by using the MacVLAN driver. Only the gdcenet0 interface
supports this configuration and only on Pods that run containerized workloads.
To configure an interface to use the MacVLAN driver:
- Configure a - NetworkAttachmentDefinitionresource as shown in the following example:- apiVersion: "k8s.cni.cncf.io/v1" kind: NetworkAttachmentDefinition metadata: name: macvlan-b400-1 annotations: networking.gke.io/gdce-vlan-id: 400 spec: config: '{ "type": "macvlan", "master": "gdcenet0.400", "ipam": { "type": "static", "addresses": [ { "address": "192.168.100.20/27", "gateway": "192.168.100.1" } ] ... } }'
- Add an annotation to your Distributed Cloud Pod definition as follows: - apiVersion: v1 kind: Pod metadata: name: macvlan-testpod1 annotations: k8s.v1.cni.cncf.io/networks: macvlan-b400-1
Configure a secondary interface on a Pod using Distributed Cloud multi-networking
Distributed Cloud supports creating a secondary network interface on a Pod by using its multi-network feature. To do so, complete the following steps:
- Configure a - Networkresource. For example:- apiVersion: networking.gke.io/v1 kind: Network metadata: name: vlan200-network spec: type: L2 nodeInterfaceMatcher: interfaceName: vlan200-interface gateway4: 10.53.0.1
- Add an annotation to your Distributed Cloud Pod definition as follows: - apiVersion: v1 kind: Pod metadata: name: myPod annotations: networking.gke.io/interfaces: [{"interfaceName":"eth1","network":"vlan200-network"}] networking.gke.io/default-interface: eth1 ...- What's next