This page walks you through the end-to-end workflow for reinforcement learning fine-tuning of Gemini models: creating a tuning job, checking its status, retrieving the tuned-model endpoint, and running inference against it.
Before you begin, see About reinforcement learning fine-tuning for an overview of the feature, supported models, and supported regions.
Create a reinforcement learning fine-tuning job
A reinforcement learning fine-tuning job is created by sending a POST
request to the tuningJobs.create endpoint. For the full request schema and
all configurable fields, see the
Reinforcement learning fine-tuning job
page.
The examples on this page use us-central1 as the tuning region. The
resulting tuned model is served from the us multi-region endpoint. For the
full list of supported tuning and serving regions, see the
Supported models and regions
section.
curl -X POST \
-H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
-H "Content-Type: application/json" \
"https://us-central1-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/us-central1/tuningJobs" \
-d \
$'{
"tunedModelDisplayName": "dai-image-test",
"baseModel": "gemini-3.5-flash",
"reinforcementTuningSpec": {
"trainingDatasetUri": "gs://path/to/your/training_dataset.jsonl",
"validationDatasetUri": "gs://path/to/your/eval_dataset.jsonl",
"hyperParameters": {
"epochCount":15,
"learningRateMultiplier":1.0,
"samplesPerPrompt":16,
"adapterSize":"ADAPTER_SIZE_SIXTEEN",
"maxOutputTokens":32768,
"batchSize":32,
"evaluateInterval":5,
"checkpointInterval":5,
"thinkingLevel":"HIGH"
},
"singleRewardConfig": {
"rewardName": "your_reward_function_name",
"parseResponseConfig": {"parseType":"IDENTITY"},
"cloudRunRewardScorer": {
"cloudRunUri":"https://your.cloud.run.uri"
}
}
}
}'
Replace the following:
- PROJECT_ID: Your Google Cloud project ID.
Check the reinforcement learning fine-tuning job status
You can monitor the progress, performance, and quality of a running reinforcement learning fine-tuning job in the Google Cloud console on the Agent Platform > Model > Tuning page. Each tuning job has a dedicated monitoring view that surfaces the underlying tuning status with charts for training and evaluation rewards, generation length, and other tuning metrics. For the full list of emitted metrics and how to interpret them, see the Metrics and monitoring page.
Training time
Training time is affected by the following factors:
- Training and validation dataset size. For details, see the Tuning dataset page.
- Hyperparameters — including
samplesPerPrompt, batch size, epoch count, and learning rate multiplier. For details, see the Hyperparameters page.
Depending on your setup, a Gemini reinforcement learning fine-tuning job can run for hours to days.
Get the tuned-model endpoint
After the tuning job reaches JOB_STATE_SUCCEEDED, retrieve the deployed
tuned-model endpoint by issuing a GetTuningJob request and reading the
tunedModel.endpoint field of the response.
curl -X GET \
-H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
"https://us-central1-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/us-central1/tuningJobs/TUNING_JOB_ID"
Replace the following:
- PROJECT_ID: Your Google Cloud project ID.
- TUNING_JOB_ID: The ID of the tuning job.
Example response (abbreviated):
{
"name": "projects/{PROJECT_ID}/locations/us-central1/tuningJobs/{TUNING_JOB_ID}",
"tunedModelDisplayName": "my-rl-tuned-model",
"state": "JOB_STATE_SUCCEEDED",
"tunedModel": {
"model": "projects/{PROJECT_ID}/locations/us-central1/models/{MODEL_ID}",
"endpoint": "projects/{PROJECT_ID}/locations/us/endpoints/{ENDPOINT_ID}"
},
"reinforcementTuningSpec": { ... }
}
Once the tuning job succeeds, endpoint from the last checkpoint is
shown in the response.
Run inference on the tuned-model endpoint
The tuned model serves predictions through the standard generateContent API
on the returned endpoint. Because the tuning job ran in us-central1, the
tuned model is served from the us multi-region endpoint.
curl -X POST \
-H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
-H "Content-Type: application/json" \
"https://aiplatform.us.rep.googleapis.com/v1beta1/projects/PROJECT_ID/locations/us/endpoints/ENDPOINT_ID:generateContent" \
-d \
$'{
"contents": [
{
"role": "user",
"parts": [
{ "text": "Why is the sky blue?" }
]
}
]
}'
Replace the following:
- PROJECT_ID: Your Google Cloud project ID.
- ENDPOINT_ID: The endpoint ID returned in the
tunedModel.endpointfield of theGetTuningJobresponse.
What's next
- Learn more about reinforcement learning fine-tuning jobs.