Deploy workloads

This page describes the steps to deploy workloads on your Google Distributed Cloud connected hardware and the limitations that you must adhere to when configuring your workloads.

Before you complete these steps, you must meet the Distributed Cloud connected installation requirements and order the Distributed Cloud hardware.

When the Google Distributed Cloud connected hardware arrives at your chosen destination, it is pre-configured with hardware, Google Cloud, and some network settings that you specified when you ordered Distributed Cloud connected.

Google installers complete the physical installation, and your system administrator connects Distributed Cloud connected to your local network.

After the hardware is connected to your local network, it communicates with Google Cloud to download software updates and connect with your Google Cloud project. You are then ready to provision node pools and deploy workloads on Distributed Cloud connected.

Deployment overview

To deploy a workload on your Distributed Cloud connected hardware, complete the following steps:

  1. Optional: Enable the Distributed Cloud Edge Network API.

  2. Optional: Initialize the network configuration of your Distributed Cloud connected zone.

  3. Optional: Configure Distributed Cloud networking.

  4. Create a Distributed Cloud connected cluster.

  5. Optional: Enable support for customer-managed encryption keys (CMEK) for local storage if you want to integrate with Cloud Key Management Service to enable support for CMEK for your workload data. For information about how Distributed Cloud connected encrypts workload data, see Local storage security.

  6. Create a node pool. In this step, you assign nodes to a node pool and optionally configure the node pool to use Cloud KMS to wrap and unwrap the Linux Unified Key Setup (LUKS) passphrase for encrypting workload data.

  7. Obtain credentials for a cluster to test the cluster.

  8. Grant users access to the cluster by assigning them the Edge Container Viewer role (roles/edgecontainer.viewer) or the Edge Container Admin role (roles/edgecontainer.admin) on the project.

  9. Assign users granular role-based access to the cluster resources by using RoleBinding and ClusterRoleBinding.

  10. Optional: Enable VM Runtime on Google Distributed Cloud support to run workloads on virtual machines on Distributed Cloud connected.

  11. Optional: Enable GPU support to run GPU-based workloads on Distributed Cloud connected.

Deploy the NGINX load balancer as a service

The following example illustrates how to deploy the NGINX server and expose it as a service on a Distributed Cloud connected cluster:

  1. Create a YAML file named nginx-deployment.yaml with the following contents:

    apiVersion: apps/v1
    kind: Deployment
    metadata:
    name: nginx
    labels:
      app: nginx
    spec:
    replicas: 1
    selector:
      matchLabels:
         app: nginx
    template:
      metadata:
         labels:
         app: nginx
      spec:
         containers:
         - name: nginx
         image: nginx:latest
         ports:
         - containerPort: 80 
  2. Apply the YAML file to the cluster using the following command:

    kubectl apply -f nginx-deployment.yaml
    
  3. Create a YAML file named nginx-service.yaml with the following contents:

    apiVersion: v1
    kind: Service
    metadata:
    name: nginx-service
    spec:
    type: LoadBalancer
    selector:
      app: nginx
      ports:
         - protocol: TCP
           port: 8080
           targetPort: 80
  4. Apply the YAML file to the cluster using the following command:

    kubectl apply -f nginx-deployment.yaml
    
  5. Obtain the external IP address assigned to the service by the MetalLB load balancer using the following command:

    kubectl get services
    

    The command returns output similar to the following:

    NAME            TYPE           CLUSTER-IP     EXTERNAL-IP     PORT(S)          AGE
    nginx-service   LoadBalancer   10.51.195.25   10.100.68.104   8080:31966/TCP   11d
    

Configure the NodeSystemConfigUpdate resources

Configure a NodeSystemConfigUpdate network function operator resource for each node in the cluster as follows.

  1. List the nodes running in the target cluster's node pool using the following command:

    kubectl get nodes | grep -v master
    

    The command returns output similar to the following:

    NAME                                 STATUS   ROLES       AGE     VERSION
    pool-example-node-1-01-b2d82cc7      Ready    <none>      2d      v1.22.8-gke.200
    pool-example-node-1-02-52ddvfc9      Ready    <none>      2d      v1.22.8-gke.200
    

    Record the returned node names and derive their short names. For example, for the pool-example-node-1-01-b2d82cc7 node, its short name is node101.

  2. For each node you've recorded in the previous step, create a dedicated NodeSystemConfigUpdate resource file with the following contents:

    apiVersion: networking.gke.io/v1
    kind: NodeSystemConfigUpdate
    metadata:
    name: nodesystemconfigupdate-NODE_SHORT_NAME
    namespace: nf-operator
    spec:
    kubeletConfig:
      cpuManagerPolicy: Static
      topologyManagerPolicy: SingleNumaNode
    nodeName: NODE_NAME
    osConfig:
      hugePagesConfig:
         ONE_GB: 2
         TWO_MB: 0
      isolatedCpusPerSocket:
         "0": 40
         "1": 40
    sysctls:
      nodeLevel:
         net.core.rmem_max: "8388608"
         net.core.wmem_max: "8388608"

    Replace the following:

    • NODE_NAME: the full name of the target node. For example, pool-example-node-1-01-b2d82cc7.
    • NODE_SHORT_NAME: the short name of the target node derived from its full name. For example, node101.

    Name each file node-system-config-update-NODE_SHORT_NAME.yaml.

  3. Apply each of the NodeSystemConfigUpdate resource files to the cluster using the following command:

    kubectl apply -f node-system-config-update-NODE_SHORT_NAME.yaml
    

    Replace NODE_SHORT_NAME with the short name of the corresponding target node.

    When you apply the resources to the cluster, each affected node reboots, which can take up to 30 minutes.

    1. Monitor the status of the affected nodes until all have successfully rebooted:
    kubectl get nodes | grep -v master
    

    The status of each node transitions from not-ready to ready as their reboots complete.

Configure a Pod for image caching

You can configure a Pod running on a Distributed Cloud connected cluster to cache its image. The Pod begins using the cached image after it's been pulled from the repository for the first time. If the node hosting the Pod runs out of storage, new images are not cached and the existing image cache is purged to ensure your workloads continue to run uninterrupted.

Your Pod configuration must meet the following prerequisites:

  • You must set the gdce.baremetal.cluster.gke.io/cache-image: true label on the Pod.
  • If you're using a private image repository, your ImagePullSecret resource must be of type kubernetes.io/dockerconfigjson.
  • You must set the Pod's pull policy to IfNotPresent to ensure the cached copy of the target image is always used. If a cached copy is not available locally, the image is then pulled from the repository.

The following example illustrates a Pod configuration with caching enabled:

apiVersion: v1
kind: Pod
metadata:
  name: cached-image-pod
  labels:
    gdce.baremetal.cluster.gke.io/cache-image: "true"
spec:
  containers:
    - name: my-container
      image: your-private-image-repo/your-image:tag
      imagePullPolicy: IfNotPresent
  imagePullSecrets:
    - name: my-image-secret  # If using a private registry

The next example illustrates a Deployment configuration with caching enabled:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: cached-image-deployment
spec:
  template:
    metadata:
      labels:
        gdce.baremetal.cluster.gke.io/cache-image: "true"
    spec:
      containers:
        - name: my-container
          image: your-private-image-repo/your-image:tag
          imagePullPolicy: IfNotPresent
      imagePullSecrets:
        - name: my-image-secret  # If using a private registry

Limitations for Distributed Cloud workloads

When you configure your Distributed Cloud connected workloads, you must adhere to the limitations described in this section. These limitations are enforced by Distributed Cloud connected on all the workloads that you deploy on your Distributed Cloud connected hardware.

Linux workload limitations

Distributed Cloud connected supports only the following Linux capabilities for workloads:

  • AUDIT_READ
  • AUDIT_WRITE
  • CHOWN
  • DAC_OVERRIDE
  • FOWNER
  • FSETID
  • IPC_LOCK
  • IPC_OWNER
  • KILL
  • MKNOD
  • NET_ADMIN
  • NET_BIND_SERVICE
  • NET_RAW
  • SETFCAP
  • SETGID
  • SETPCAP
  • SETUID
  • SYS_CHROOT
  • SYS_NICE
  • SYS_PACCT
  • SYS_PTRACE
  • SYS_RESOURCE
  • SYS_TIME

Namespace restrictions

Distributed Cloud connected does not support the following namespaces:

  • hostPID
  • hostIPC
  • hostNetwork

Resource type restrictions

Distributed Cloud connected does not support the CertificateSigningRequest resource type, which allows a client to ask for an X.509 certificate to be issued, based on a signing request.

Security context restrictions

Distributed Cloud connected does not support the privileged mode security context.

Pod binding restrictions

Distributed Cloud connected does not support binding Pods to host ports in the HostNetwork namespace. Additionally, the HostNetwork namespace is not available.

hostPath volume restrictions

Distributed Cloud connected only allows the following hostPath volumes with read/write access:

  • /dev/hugepages
  • /dev/infiniband
  • /dev/vfio
  • /dev/char
  • /sys/devices

PersistentVolumeClaim resource type restrictions

Distributed Cloud connected only allows the following PersistentVolumeClaim resource types:

  • csi
  • nfs
  • local

Volume type restrictions

Distributed Cloud connected only allows the following volume types:

  • configMap
  • csi
  • downwardAPI
  • emptyDir
  • hostPath
  • nfs
  • persistentVolumeClaim
  • projected
  • secret

Pod toleration restrictions

Distributed Cloud connected does not allow user-created Pods on control plane nodes. Specifically, Distributed Cloud connected does not allow scheduling Pods that have the following toleration keys:

  • ""
  • node-role.kubernetes.io/master
  • node-role.kubernetes.io/control-plane

Impersonation restrictions

Distributed Cloud connected does not support user or group impersonation.

Management namespace restrictions

Distributed Cloud connected does not allow access to the following namespaces:

  • ai-system
  • ai-speech-system
  • ai-ocr-system
  • ai-translation-system
  • anthos-identity-service
  • cert-manager
  • dataproc-system
  • dataproc-PROJECT_ID
  • dns-system
  • g-istio-system
  • gke-connect
  • gke-managed-metrics-server
  • gke-operators
  • g-ospf-servicecontrol-system
  • g-ospf-system
  • g-pspf-system
  • gke-system
  • gpc-backup-system
  • iam-system
  • kube-node-lease
  • kube-public
  • kube-system with the exception of deleting ippools.whereabouts.cni.cncf.io
  • metallb-system with the exception of editing configMap resources to set load-balancing IP address ranges
  • nf-operator
  • oclcm-system
  • prediction
  • rm-system
  • robinio
  • saas-system
  • vm-system

PROJECT_ID denotes the ID of the target Google Cloud project.

Avoid the use of any namespace with the g- prefix in its name. Such namespaces are typically a reserved namespace used by Distributed Cloud connected.

Webhook restrictions

Distributed Cloud connected restricts webhooks as follows:

  • Any mutating webhook that you create automatically excludes the kube-system namespace.
  • Mutating webhooks are disabled for the following resource types:
    • nodes
    • persistentvolumes
    • certificatesigningrequests
    • tokenreviews

Pod priority restrictions

Distributed Cloud connected requires that you set the priority of your workload Pods to a value lower than 500000000.

Configure the runtime class for a Pod

Distributed Cloud connected lets you specify the runtime class for a Pod in its configuration using the runtimeClassName field. This overrides the default runtime class specified at cluster level. The available runtime classes are runc and gvisor. For example:

apiVersion: v1
kind: Pod
metadata:
  name: myPod
spec:
  runtimeClassName: gvisor
  containers:
  - name: myPod
    image: myPodImage 
  restartPolicy: OnFailure

If you omit this in your Pod configuration, the Pod uses the class specified at the cluster level. The default cluster-level runtime class is runc unless you configure a default runtime class using the --default-container-runtime parameter as described in Create and manage clusters.

If you change the runtime class at either Pod or cluster level, you must restart the affected Pods for the change to take effect.

gvisor runtime class

Specifying the gvisor runtime class switches the Pod to the Open Container Initiative (OCI) secure runtime based on gVisor. gVisor is a sandboxing solution that introduces strong isolation between the workload and its host.

What's next