This tutorial shows you how to orchestrate a distributed training environment for reinforcement learning on Google Kubernetes Engine (GKE). You use Ray and the verl (Volcano Engine Reinforcement Learning) framework to set up a distributed training environment to fine-tune a Qwen2.5-32B-Instruct model.
This tutorial focuses on the Group Relative Policy Optimization (GRPO) training pipeline on GKE with Ray and verl. GRPO is a reinforcement learning algorithm designed to improve a model's reasoning ability. This memory-efficient algorithm simplifies the reinforcement learning (RL) process by eliminating the Critic, or value model, and using a relative group-based calculation instead.
This tutorial is a good starting point if you need to set up a distributed training environment where data, model weights, and the training engine are decoupled for efficiency.
Background
The following sections provide a brief overview of the concepts used in this tutorial.
Reinforcement learning
RL teaches models through experience, exploration, and feedback rather than static imitation. While pre-training teaches a model what to say, RL—specifically Reinforcement Learning from Human Feedback (RLHF)—teaches it how to be helpful, safe, and logical. RL serves as the bridge between a base model and a fine-tuned model for a specialized use case.
For more information, see What is reinforcement learning?
Volcano Engine Reinforcement Learning (verl)
verl is a high-performance framework designed to handle the complex memory and compute patterns of LLM-based RL.
For more information, see verl.
Group Relative Policy Optimization (GRPO)
GRPO, an algorithm popularized by DeepSeek, offers a memory-efficient alternative to Proximal Policy Optimization (PPO) for LLM alignment by removing the Critic model. Instead of a Critic network, GRPO generates a group of responses for the same prompt and uses the average reward of that group as the baseline.
For more information, see GRPO.
Objectives
This tutorial shows you how set up reinforcement learning on GKE with verl, by completing the following steps:
- Set up a GKE cluster with B200 or H200 GPUs.
- Configure KubeRay to manage a distributed Ray cluster.
- Use Cloud Storage FUSE to mount a Cloud Storage bucket across all nodes.
- Run a GRPO training job using verl to align the Qwen2.5-32B-Instruct model with the GSM8K dataset.
Before you begin
- Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
-
Install the Google Cloud CLI.
-
If you're using an external identity provider (IdP), you must first sign in to the gcloud CLI with your federated identity.
-
To initialize the gcloud CLI, run the following command:
gcloud init -
Create or select a Google Cloud project.
Roles required to select or create a project
- Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
-
Create a project: To create a project, you need the Project Creator role
(
roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.createpermission. Learn how to grant roles.
-
Create a Google Cloud project:
gcloud projects create PROJECT_ID
Replace
PROJECT_IDwith a name for the Google Cloud project you are creating. -
Select the Google Cloud project that you created:
gcloud config set project PROJECT_ID
Replace
PROJECT_IDwith your Google Cloud project name.
-
Verify that billing is enabled for your Google Cloud project.
Enable the required APIs:
Roles required to enable APIs
To enable APIs, you need the Service Usage Admin IAM role (
roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enablepermission. Learn how to grant roles.gcloud services enable container.googleapis.com
storage.googleapis.com compute.googleapis.com -
Install the Google Cloud CLI.
-
If you're using an external identity provider (IdP), you must first sign in to the gcloud CLI with your federated identity.
-
To initialize the gcloud CLI, run the following command:
gcloud init -
Create or select a Google Cloud project.
Roles required to select or create a project
- Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
-
Create a project: To create a project, you need the Project Creator role
(
roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.createpermission. Learn how to grant roles.
-
Create a Google Cloud project:
gcloud projects create PROJECT_ID
Replace
PROJECT_IDwith a name for the Google Cloud project you are creating. -
Select the Google Cloud project that you created:
gcloud config set project PROJECT_ID
Replace
PROJECT_IDwith your Google Cloud project name.
-
Verify that billing is enabled for your Google Cloud project.
Enable the required APIs:
Roles required to enable APIs
To enable APIs, you need the Service Usage Admin IAM role (
roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enablepermission. Learn how to grant roles.gcloud services enable container.googleapis.com
storage.googleapis.com compute.googleapis.com -
Grant roles to your user account. Run the following command once for each of the following IAM roles:
roles/container.admin, roles/iam.serviceAccountAdmin, roles/storage.admingcloud projects add-iam-policy-binding PROJECT_ID --member="user:USER_IDENTIFIER" --role=ROLE
Replace the following:
PROJECT_ID: Your project ID.USER_IDENTIFIER: The identifier for your user account. For example,myemail@example.com.ROLE: The IAM role that you grant to your user account.
- Create a Hugging Face account, if you don't already have one.
- Ensure that you have a Hugging Face token.
- Ensure your project has sufficient quota for B200 and H200 GPUs. To learn more, see Plan GPU quota and GPU quota.
Prepare your environment
In this tutorial, you use Cloud Shell.
Go to the Google Cloud console.
At the top of the Google Cloud console window, click the Activate Cloud Shell button.
Set the following environment variables:
export PROJECT_ID=$(gcloud config get project) export PROJECT_NUMBER=$(gcloud projects describe ${PROJECT_ID} --format="value(projectNumber)") export GPU_TYPE=GPU_TYPE export CONTROL_PLANE_LOCATION=CONTROL_PLANE_LOCATION export NODE_LOCATION=NODE_LOCATION export CLUSTER_NAME=CLUSTER_NAME export KSA_NAME=CLUSTER_NAME export GS_BUCKET=BUCKET_NAME-${PROJECT_ID} export NAMESPACE=default export HF_TOKEN=YOUR_HUGGING_FACE_TOKEN export MACHINE_TYPE=MACHINE_TYPE export GKE_VERSION=GKE_VERSIONReplace the following values:
CONTROL_PLANE_LOCATION: the Compute Engine region for the GKE cluster control plane.GPU_TYPE: the accelerator that you reserved in the Compute Engine capacity reservation. Must be one of the following values:nvidia-b200: NVIDIA B200 (180GB)nvidia-h200-141gb: NVIDIA H200 (141GB)
NODE_LOCATION: the zone for the GKE nodes. Select a zone where NVIDIA B200 or H200 GPUs are available.CLUSTER_NAME: the name of your GKE cluster.BUCKET_NAME: the base name for your Cloud Storage bucket. You don't need to specify thegs://prefix.YOUR_HUGGING_FACE_TOKEN: your Hugging Face token for model access.MACHINE_TYPE: the type of machine to use. Valid options arec2standard8orc2standard16.GKE_VERSION: the version of GKE to use:- For NVIDIA B200 (180 GB) GPUs, use
1.32.2-gke.1422000or later. - For NVIDIA H200 (141GB) GPUs, use
1.31.4-gke.1183000or later.
- For NVIDIA B200 (180 GB) GPUs, use
Create the following environment variables for the network:
export GVNIC_NETWORK_PREFIX="GVNIC-NAME" export RDMA_NETWORK_PREFIX="RDMA-NAME"Replace the following values:
GVNIC-NAME: the prefix for the gVNIC network name. You can use any prefix you want.RDMA-NAME: the prefix for the remote direct memory access (RDMA) network. You can use any prefix you want.
Set up infrastructure
In this section, you create a RDMA network and a GKE cluster.
Create RDMA network and subnets
Create a VPC network for the gVNIC interface:
gcloud compute networks create ${GVNIC_NETWORK_PREFIX}-net \ --subnet-mode=custom \ --project=${PROJECT} gcloud compute networks subnets create ${GVNIC_NETWORK_PREFIX}-sub \ --network=${GVNIC_NETWORK_PREFIX}-net \ --location=${CONTROL_PLANE_LOCATION} \ --range=192.168.0.0/24 gcloud compute firewall-rules create ${GVNIC_NETWORK_PREFIX}-internal \ --network=${GVNIC_NETWORK_PREFIX}-net \ --action=ALLOW \ --rules=tcp:0-65535,udp:0-65535,icmp \ --source-ranges=192.168.0.0/16Create a VPC network and subnets for RDMA with 8 subnets for 8 GPUs:
gcloud beta compute networks create ${RDMA_NETWORK_PREFIX}-net \ --network-profile=${CONTROL_PLANE_LOCATION}-vpc-roce \ --subnet-mode=custom for N in $(seq 0 7); do gcloud compute networks subnets create ${RDMA_NETWORK_PREFIX}-sub-$N \ --network=${RDMA_NETWORK_PREFIX}-net \ --location=${CONTROL_PLANE_LOCATION} \ --range=192.168.$((N+1)).0/24 & done waitClone the sample repository:
git clone https://github.com/GoogleCloudPlatform/kubernetes-engine-samples.git cd kubernetes-engine-samplesNavigate to the working directory:
cd ai-ml/verl-on-gke
Create the GKE cluster
You can set verl in a GKE Autopilot or Standard cluster. We recommend that you use a Autopilot cluster for a fully managed Kubernetes experience. To choose the GKE mode of operation that's the best fit for your workloads, see Choose a GKE mode of operation.
Autopilot
Create an Autopilot cluster:
gcloud container clusters create-auto ${CLUSTER_NAME} \ --location=${CONTROL_PLANE_LOCATION} \ --enable-multi-networking \ --enable-ray-operatorGet credentials for your cluster:
gcloud container clusters get-credentials ${CLUSTER_NAME} \ --location=${REGION}Install the NCCL RDMA installer for Autopilot:
kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/refs/heads/master/gpudirect-rdma/nccl-rdma-installer-autopilot.yaml
Standard
Create a Standard cluster:
gcloud container clusters create ${CLUSTER_NAME} \ --location=${CONTROL_PLANE_LOCATION} \ --location=${ZONE} \ --enable-dataplane-v2 \ --enable-ip-alias \ --enable-multi-networking \ --addons=RayOperator,GcsFuseCsiDriver \ --machine-type=${MACHINE_TYPE} \ --num-nodes=1 \ --min-nodes=1 \ --max-nodes=5 \ --enable-autoscalingGet credentials for your cluster:
gcloud container clusters get-credentials ${CLUSTER_NAME} --location=${ZONE}Create the GPU node pool (using Spot instances for cost efficiency):
gcloud container node-pools create gpu-pool \ --cluster=${CLUSTER_NAME} \ --location=${NODE_LOCATION} \ --machine-type=${MACHINE_TYPE} \ --accelerator=type=${GPU_TYPE},count=8,gpu-driver-version=DEFAULT \ --spot \ --enable-autoscaling \ --num-nodes=0 \ --total-max-nodes=10 \ --additional-node-network=network=${GVNIC_NETWORK_PREFIX}-net,subnetwork=${GVNIC_NETWORK_PREFIX}-sub \ --additional-node-network=network=${RDMA_NETWORK_PREFIX}-net,subnetwork=${RDMA_NETWORK_PREFIX}-sub-0 \ --additional-node-network=network=${RDMA_NETWORK_PREFIX}-net,subnetwork=${RDMA_NETWORK_PREFIX}-sub-1 \ --additional-node-network=network=${RDMA_NETWORK_PREFIX}-net,subnetwork=${RDMA_NETWORK_PREFIX}-sub-2 \ --additional-node-network=network=${RDMA_NETWORK_PREFIX}-net,subnetwork=${RDMA_NETWORK_PREFIX}-sub-3 \ --additional-node-network=network=${RDMA_NETWORK_PREFIX}-net,subnetwork=${RDMA_NETWORK_PREFIX}-sub-4 \ --additional-node-network=network=${RDMA_NETWORK_PREFIX}-net,subnetwork=${RDMA_NETWORK_PREFIX}-sub-5 \ --additional-node-network=network=${RDMA_NETWORK_PREFIX}-net,subnetwork=${RDMA_NETWORK_PREFIX}-sub-6 \ --additional-node-network=network=${RDMA_NETWORK_PREFIX}-net,subnetwork=${RDMA_NETWORK_PREFIX}-sub-7Install the NCCL RDMA installer used for Standard clusters:
kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/refs/heads/master/gpudirect-rdma/nccl-rdma-installer.yaml
Configure network mappings
Inspect the
network-mapping.yamlmanifest:Apply the manifest:
kubectl apply -f network-mapping.yaml
Prepare data and storage
Create a Cloud Storage bucket:
gcloud storage buckets create gs://${GS_BUCKET} --location=${REGION} --enable-hierarchical-namespace --uniform-bucket-level-accessCreate a Kubernetes Service Account (KSA) and bind it to the bucket:
kubectl create serviceaccount ${KSA_NAME} --namespace ${NAMESPACE} gcloud storage buckets add-iam-policy-binding gs://${GS_BUCKET} \ --member "principal://iam.googleapis.com/projects/${PROJECT_NUMBER}/locations/global/workloadIdentityPools/${PROJECT_ID}.svc.id.goog/subject/ns/${NAMESPACE}/sa/${KSA_NAME}" \ --role "roles/storage.objectUser"Create the Secret for Hugging Face:
kubectl create secret generic hf-secret --from-literal=hf_api_token=${HF_TOKEN}Inspect the
gcsfuse-storage.yamlmanifest:Apply the manifest:
kubectl apply -f gcsfuse-storage.yaml
Prepare model and data
You can run these commands locally or on a GKE Pod to populate the bucket.
Clone the verl repository:
git clone https://github.com/volcengine/verl.gitDownload the Qwen2.5-32B-Instruct model using the Hugging Face CLI:
huggingface-cli download Qwen/Qwen2.5-32B-Instruct --local-dir Qwen2.5-32B-InstructPreprocess the GSM8K dataset:
python examples/data_preprocess/gsm8k.py --local_save_dir ~/data/gsm8kUpload the model, data, and the verl code to your Cloud Storage bucket:
gcloud storage cp --recursive verl gs://${GS_BUCKET}/verl gcloud storage cp --recursive Qwen2.5-32B-Instruct gs://${GS_BUCKET}/Qwen2.5-32B-Instruct gcloud storage cp --recursive ~/data/gsm8k/* ${GS_BUCKET}
Deploy RayCluster custom resource
Deploy a RayCluster custom resource, which typically consists of one system Pod and multiple worker Pods.
Autopilot
Deploy the RayCluster. Save the following to
ray-cluster-auto.yaml:Apply the RayCluster:
kubectl apply -f ray-cluster.yaml
Standard
Deploy the RayCluster. Save the following to
ray-cluster.yaml:Apply the RayCluster:
kubectl apply -f ray-cluster.yaml
Launch the GRPO Job
Set up port forwarding to the Ray dashboard node:
kubectl port-forward svc/b200-ray-cluster-head-svc 8265:8265Inspect the
runtime-env.yamlmanifest:If you use H200 GPUs, change
NCCL_TUNER_CONFIG_PATHto/usr/local/gib/configs/tuner_config_a3u.txtpb.This file is used by the Ray client. You don't need to apply this manifest to the cluster.
Submit the Job using
ray job submit:ray -- job submit \ --address "http://localhost:8265" \ --runtime-env runtime-env.yaml \ -- \ bash -c " cd /data/verl && PYTHONUNBUFFERED=1 python3 -m verl.trainer.main_ppo \ data.train_files=/data/gsm8k/train.parquet \ data.val_files=/data/gsm8k/test.parquet \ data.train_batch_size=256 \ data.max_prompt_length=512 \ data.max_response_length=512 \ actor_rollout_ref.model.path=Qwen/Qwen2.5-32B-Instruct \ actor_rollout_ref.actor.optim.lr=1e-5 \ actor_rollout_ref.actor.ppo_mini_batch_size=256 \ actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=64 \ actor_rollout_ref.rollout.name=vllm \ actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=8 \ actor_rollout_ref.rollout.tensor_model_parallel_size=8 \ actor_rollout_ref.rollout.gpu_memory_utilization=0.6 \ actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=4 \ actor_rollout_ref.actor.strategy=fsdp2 \ algorithm.kl_ctrl.kl_coef=0.001 \ trainer.logger=console \ trainer.val_before_train=False \ trainer.n_gpus_per_node=8 \ trainer.nnodes=2 \ trainer.save_freq=10 \ trainer.test_freq=10 \ algorithm.adv_estimator=grpo \ actor_rollout_ref.rollout.n=8 \ trainer.total_epochs=2" 2>&1 | tee verl_demo.logMonitor the logs in the Ray Dashboard or output. Look for
critic/score/meanto increase, indicating learning.
Clean up
To avoid incurring charges, delete the resources:
kubectl delete raycluster b200-ray-cluster # change to variables
gcloud container clusters delete ${CLUSTER_NAME} --location=${CONTROL_PLANE_LOCATION}
gcloud storage rm -r gs://${GS_BUCKET}