Continuous tuning lets you continue tuning an already tuned model or model checkpoint by adding more epochs or training examples. Using an already tuned model or checkpoint as the base model allows for more efficient tuning experimentation.
You can use continuous tuning for the following purposes:
- To tune with more data if an existing tuned model is underfitting.
- To boost performance or keep the model up to date with new data.
- To further customize an existing tuned model.
Supported continuous tuning patterns
The following continuous tuning patterns are supported:
- Supervised fine-tuning → Reinforcement learning fine-tuning
- Reinforcement learning fine-tuning → Reinforcement learning fine-tuning
Configure continuous tuning
To configure a continuous tuning job, include a preTunedModel block in the
request body that points to the previously tuned model (and, optionally, a
specific checkpoint). The rest of the request follows the same schema as a
new reinforcement learning fine-tuning job.
{
"description": string,
"tunedModelDisplayName": string,
"reinforcementTuningSpec": {
"trainingDatasetUri": "TRAINING_DATASET",
"validationDatasetUri": "VALIDATION_DATASET",
"hyperParameters": "HYPER_PARAMETERS",
"singleRewardConfig": "REWARD_CONFIG"
},
"preTunedModel": {
"tunedModelName": "projects/PROJECT_ID/locations/LOCATION_ID/models/PRETUNED_MODEL_ID",
"checkpointId": "CHECKPOINT_ID"
}
}
The placeholders in the body are:
- TRAINING_DATASET: Cloud Storage URI of the training dataset JSONL file.
- VALIDATION_DATASET: Cloud Storage URI of the validation dataset JSONL file. Optional.
- HYPER_PARAMETERS: Hyperparameter configuration for the continuous tuning job. For details, see the Hyperparameters page.
- REWARD_CONFIG: Reward configuration. For details, see the Reward functions page.
- PROJECT_ID: Your Google Cloud project ID.
- LOCATION_ID: The location ID.
- PRETUNED_MODEL_ID: The model ID of the previously tuned model to continue from.
- CHECKPOINT_ID: The ID of a specific checkpoint of the previously tuned model. Optional — if omitted, the latest checkpoint of the pretuned model is used.
What's next
- Prepare a tuning dataset for the continuous tuning job.
- Configure hyperparameters and reward functions for the continuous tuning job.
- Monitor job status and metrics while continuous tuning runs.