Use the kubectl-apply module to apply Kubernetes manifests to your
Google Kubernetes Engine clusters.
This module lets you embed manifests directly as string content or reference them from remote locations, files, templates, or entire directories. By using this module, you streamline the deployment of commonly used infrastructure components and performance optimizations. Supported components include the Kueue scheduler, the Jobset API, the NVIDIA GPU Operator, the NCCL gIB plugin, and ASAPD-Lite (for optimizing specific machine types, such as the A4X Max machine type).
For the complete list of inputs and outputs for this module, see the
kubectl-apply
module
page in the Cluster Toolkit GitHub repository.
Before you begin
Before you begin, verify that you meet the following requirements:
- You have installed and configured Cluster Toolkit. For installation instructions, see Set up Cluster Toolkit.
- You have an existing cluster blueprint. You can use and modify an existing
blueprint or create one from scratch. For a working example of a blueprint
configured for the
kubectl-applymodule, see theexamples/a3-megagpu-8g.yamlfile. For more information about creating and customizing blueprints, see Cluster blueprint. - To view a complete list of blueprints that support the
kubectl-applymodule, go to the Cluster blueprint catalog page, click the Select scheduler menu and then select GKE.
Required roles
To get the permissions that
you need to apply Kubernetes manifests to the cluster,
ask your administrator to grant you the
Kubernetes Engine Admin (roles/container.admin) IAM role on your project.
For more information about granting roles, see Manage access to projects, folders, and organizations.
You might also be able to get the required permissions through custom roles or other predefined roles.
Configure manifests
You can specify manifests in your blueprint by using several methods:
- Provide the raw string: specify the module configuration by using the
content: manifest_body format. - Remote URL: a single URL to a manifest file, such as the
https://github.com/kubernetes-sigs/jobset/releases/download/v0.6.0/manifests.yamlURL. - Single local file: a local YAML Ain't Markup Language (YAML) manifest
file, such as the
./manifest.yamlpath. - Template file: a template file with the
.tftplextension to generate a manifest, such as the./template.yaml.tftplpath. You can pass variables to format the template file by using thetemplate_varsfield. - Directory: a directory that contains multiple YAML files or template
files, such as the
./manifests/path. You can pass variables to format the template files by using thetemplate_varsfield.
Example configurations
The following sections provide examples that demonstrate how to configure the kubectl-apply module.
Apply manifests
The following example demonstrates how to apply manifests by using different source methods:
- id: existing-gke-cluster
source: modules/scheduler/pre-existing-gke-cluster
settings:
project_id: $(vars.project_id)
cluster_name: my-gke-cluster
region: us-central1
- id: kubectl-apply
source: modules/management/kubectl-apply
use: [existing-gke-cluster]
settings:
- content: |
apiVersion: v1
kind: Namespace
metadata:
name: my-namespace
- source: "https://github.com/kubernetes-sigs/jobset/releases/download/v0.6.0/manifests.yaml"
- source: $(ghpc_stage("manifests/configmap1.yaml"))
- source: $(ghpc_stage("manifests/configmap2.yaml.tftpl"))
template_vars: {name: "dev-config", public: "false"}
- source: $(ghpc_stage("manifests"))/
template_vars: {name: "dev-config", public: "false"}
Deploy workload components
You can deploy workload components and performance optimizations by setting the install: true field for the component in the settings block. The module supports the installation of the following components:
Kueue scheduler: manages workload queuing and resource allocation.
- id: workload_component_install source: modules/management/kubectl-apply use: [gke_cluster] settings: kueue: install: true version: v0.10.0Jobset API: manages groups of related jobs as a single unit.
... settings: jobset: install: trueNVIDIA GPU Operator: automates the setup and management of software needed to provision GPUs.
... settings: nvidia_gpu_operator: install: trueNCCL gIB plugin: improves multi-GPU and multi-node communication speed.
... settings: nccl_gib_plugin: install: trueASAPD-Lite: provides storage and data caching optimization for high-performance machine types, such as
a4x-max.... settings: asapd_lite: install: true
The config_path field in the kueue block also accepts a template file. If
you provide a template file, then you must provide variables for the template by
using the config_template_vars field. The following example demonstrates how
to pass variables:
- id: workload_component_install
source: modules/management/kubectl-apply
use: [gke_cluster]
settings:
kueue:
install: true
config_path: $(ghpc_stage("manifests/user-provided-kueue-config.yaml.tftpl"))
config_template_vars: {name: "dev-config", public: "false"}
jobset:
install: true
You can specify a particular Kueue version that you want to use by using the
version field. We recommend that you use the v0.10.0 version. To view all
available versions, see the supported Kueue versions
documentation.
- id: workload_component_install
source: modules/management/kubectl-apply
use: [gke_cluster]
settings:
kueue:
install: true
version: v0.10.0
config_path: $(ghpc_stage("manifests/user-provided-kueue-config.yaml.tftpl"))
config_template_vars: {name: "dev-config", public: "false"}
jobset:
install: true
Helm release naming
The kubectl-apply module generates deterministic names for the Helm releases
that are used to manage your manifests. This naming convention helps ensure consistent release names across
redeployments.
The module determines the Helm release name for each manifest by using the following precedence hierarchy:
- Explicit name: the value of the
nameattribute in theapply_manifestslist, if specified. - File basename: the basename of the manifest file. For example, the value
kueue-manifestis the basename for thekueue-manifest.yamlfile. - Fallback: a generated name in the format
MODULE_ID-raw-HASH. Cluster Toolkit replacesMODULE_IDwith the ID of the module, and replaceHASHwith the hash value.
Set a custom name for the Helm release
To specify a custom name for a Helm release, add the name attribute to the
apply_manifests entry in your cluster blueprint:
- id: my-kubectl-apply
source: modules/management/kubectl-apply
settings:
apply_manifests:
- source: modules/management/kubectl-apply/manifests/kueue-v0.12.2.yaml
name: custom-kueue-release
Troubleshoot common errors
This section lists errors that you might encounter when you apply manifests
directly from remote http:// or https:// URLs by using the kubectl-apply
module. For more information about these methods, see
Configure manifests.
For production environments, we recommend that you source manifests from local paths or a version-controlled Git repository, because the URL method introduces additional complexity.
If you use the URL method, then you can troubleshoot the configuration by using the solutions in this section.
Race conditions when you apply manifests in parallel
The following issue occurs when you apply a manifest with custom resources, such
as a ClusterQueue resource, at the same time as the manifest that defines the
CustomResourceDefinition (CRD) object for those resources. Because the module
applies manifests from the apply_manifests list in parallel, there is no
guarantee that the cluster creates the CRD object before the resource that uses
the CRD object. This race condition might result in errors like the following:
Error: resource [kueue.x-k8s.io/v1beta1/ClusterQueue] isn't valid for cluster
To resolve this issue, use a two-stage deployment process to manually enforce the correct order of operations. Complete the following steps:
Initial deployment: in your blueprint, include only the manifests that contain the CRD resources in the
apply_manifestslist. Your first deployment might look like the following example:settings: apply_manifests: # This manifest contains the CRDs for Kueue - source: "https://github.com/kubernetes-sigs/kueue/releases/download/v0.11.4/manifests.yaml"Run the first deployment: to deploy the CRD resources, use the
gcluster deploycommand or theterraform applycommand.Second deployment: after the first deployment succeeds, add the manifests that contain your custom resources to the
apply_manifestslist and deploy the updated blueprint. Because the CRD objects now exist in the cluster, the second deployment succeeds. The example settings for the second deployment might look like the following:settings: apply_manifests: # The CRD manifest is still present - source: "https://github.com/kubernetes-sigs/kueue/releases/download/v0.11.4/manifests.yaml" # Now, add your configuration manifest - source: "https://gist.githubusercontent.com/YourUser/..." # Your configuration URLRun the deployment command again: by having these custom resources exist in the cluster, the deployment is much more likely to succeed.
Terraform template files from remote URLs
The following issue occurs when you attempt to render a template file with the
.tftpl extension that you source from a remote URL. The kubectl-apply module
can't render template files from remote sources.
To resolve this issue, you must render the template into YAML file locally, host that rendered file at a URL, and provide the URL of the rendered file in your blueprint.
What's next
- For the complete list of inputs and outputs for this module, see the
kubectl-applymodule page in the Cluster Toolkit GitHub repository.