The Vertex AI SDK includes several classes that you use when you
train your model. Most of the training classes are used to create, train, and
return your model. Use the
HyperparameterTuningJob to tune the
training job's hyperparameters. Use the
PipelineJob manage your machine learning (ML)
workflow so you can automate and monitor your ML systems.
The following topics provide a high-level description of each training-related class in the Vertex AI SDK.
AutoML training classes for structured data
Vertex AI SDK includes the following classes that are used to train a structured AutoML model.
AutoMLForecastingTrainingJob
The AutoMLForecastingTrainingJob
class uses the AutoML training method to train and run a forecasting model.
The AutoML training method is a good choice for most forecasting use cases. If
your use case doesn't benefit from the Seq2seq or the
Temporal fusion transformer training method that the
SequenceToSequencePlusForecastingTrainingJob
and
TemporalFusionTransformerForecastingTrainingJob
classes offer respectively, then AutoML is likely the best training method for your forecasting
predictions.
For sample code that shows you how to use
AutoMLForecastingTrainingJob, see the Create a training pipeline forecasting sample on GitHub.
AutoMLTabularTrainingJob
The AutoMLTabularTrainingJob class
represents a job that creates, trains, and returns an AutoML tabular model.
For more information about training tabular models and Vertex AI, see
Tabular data and Tabular data
overview.
The following sample code snippet shows how you might use the
Vertex AI SDK to create and run an AutoML tabular model:
dataset = aiplatform.TabularDataset('projects/my-project/location/us-central1/datasets/{DATASET_ID}')
job = aiplatform.AutoMLTabularTrainingJob(
  display_name="train-automl",
  optimization_prediction_type="regression",
  optimization_objective="minimize-rmse",
)
model = job.run(
    dataset=dataset,
    target_column="target_column_name",
    training_fraction_split=0.6,
    validation_fraction_split=0.2,
    test_fraction_split=0.2,
    budget_milli_node_hours=1000,
    model_display_name="my-automl-model",
    disable_early_stopping=False,
)
SequenceToSequencePlusForecastingTrainingJob
The
SequenceToSequencePlusForecastingTrainingJob
class uses the Seq2seq+ training method to train and run a forecasting model.
The Seq2seq+ training method is a good choice for experimentation. Its
algorithm is simpler and uses a smaller search space than the AutoML option.
Seq2seq+ is a good option if you want fast results and your datasets are
smaller than 1 GB.
For sample code that shows you how to use
SequenceToSequencePlusForecastingTrainingJob, see the Create a training pipeline forecasting Seq2seq sample on GitHub.
TemporalFusionTransformerForecastingTrainingJob
The
TemporalFusionTransformerForecastingTrainingJob
class uses the Temporal Fusion Transformer (TFT) training method to train and
run a forecasting model. The TFT training method implements an attention-based
deep neural network (DNN) model that uses a multi-horizon forecasting task to
produce predictions.
For sample code that shows you how to use
TemporalFusionTransformerForecastingTrainingJob,
see the Create a training pipeline forecasting temporal fusion transformer
sample
on GitHub.
TimeSeriesDenseEncoderForecastingTrainingJob
The
TimeSeriesDenseEncoderForecastingTrainingJob
class uses the Time-series Dense Encoder (TiDE) training method to train and run
a forecasting model. TiDE uses a
multi-layer perceptron (MLP) to provide the
speed of forecasting linear models with covariates and non-linear dependencies.
For more information about TiDE, see
Recent advances in deep long-horizon forecasting
and this
TiDE blog post.
AutoML training classes for unstructured data
The Vertex AI SDK includes the following classes to train unstructured image models:
AutoMLImageTrainingJob
Use the AutoMLImageTrainingJob class to
create, train, and return an image model. For more information about working
with image data models in Vertex AI, see
Image data.
For an example of how to use the
AutoMLImageTrainingJob class, see the
tutorial in the AutoML image
classification
notebook.
Custom data training classes
You can use the Vertex AI SDK to automate a custom training workflow. For information about using Vertex AI to run custom training applications, see Custom training overview.
The Vertex AI SDK includes three classes that create a custom
training pipeline. A training pipeline accepts an input Vertex AI managed
dataset that it uses to train a model. Next, it returns the model after the
training job completes. Each of the three custom training pipeline classes
creates a training pipeline differently.
CustomTrainingJob uses a Python script,
CustomContainerTrainingJob uses a
custom container, and
CustomPythonPackageTrainingJob
uses a Python package and a prebuilt container.
The CustomJob class creates a custom training job
but is not a pipeline. Unlike a custom training pipeline, the
CustomJob class can use a dataset that's not a
Vertex AI managed dataset to train a model, and it doesn't return the
trained model. Because the class accepts different types of datasets and doesn't
return a trained model, it's less automated and more flexible than a custom
training pipeline.
CustomContainerTrainingJob
Use the CustomContainerTrainingJob
class to use a container to launch a custom training pipeline in Vertex AI.
For an example of how to use the
CustomContainerTrainingJob class,
see the tutorial in the PyTorch Image Classification Multi-Node Distributed
Data Parallel Training on GPU using Vertex AI Training with Custom Container
notebook.
CustomJob
Use the CustomJob class to use a script to launch a
custom training job in Vertex AI.
A training job is more flexible than a training pipeline because you aren't
restricted to loading your data in a Vertex AI managed dataset and a
reference to your model isn't registered after the training job completes. For
example, you might want to use the CustomJob class, its
from_local_script
method, and a script to load a dataset from
scikit-learn or
TensorFlow. Or, you might want to analyze or test
your trained model before you register it to Vertex AI.
For more information about custom training jobs, including requirements before submitting a custom training job, what a custom job includes, and a Python code sample, see Create custom training jobs.
Because the
CustomJob.run
doesn't return the trained model, you need to use a script to write the model
artifact to a location, such as a Cloud Storage bucket. For more information,
see Export a trained ML model.
The following sample code demonstrates how to create and run a custom job using a sample worker pool specification. The code writes the trained model to a Cloud Storage bucket named artifact-bucket.
# Create a worker pool spec that specifies a TensorFlow cassava dataset and
# includes the machine type and Docker image. The Google Cloud project ID
# is 'project-id'.
worker_pool_specs=[
     {
        "replica_count": 1,
        "machine_spec": { "machine_type": "n1-standard-8",
                          "accelerator_type": "NVIDIA_TESLA_V100",
                          "accelerator_count": 1
        },
        "container_spec": {"image_uri": "gcr.io/{project-id}/multiworker:cassava"}
      },
      {
        "replica_count": 1,
        "machine_spec": { "machine_type": "n1-standard-8",
                          "accelerator_type": "NVIDIA_TESLA_V100",
                          "accelerator_count": 1
        },
        "container_spec": {"image_uri": "gcr.io/{project-id}/multiworker:cassava"}
      }
]
# Use the worker pool spec to create a custom training job. The custom training
# job artifacts are stored in the Cloud Storage bucket
# named 'artifact-bucket'.
your_custom_training_job = aiplatform.CustomJob(
                                      display_name='multiworker-cassava-sdk',
                                      worker_pool_specs=worker_pool_specs,
                                      staging_bucket='gs://{artifact-bucket}')
# Run the training job. This method doesn't return the trained model.
my_multiworker_job.run()
CustomPythonPackageTrainingJob
Use the
CustomPythonPackageTrainingJob
class to use a Python package to launch a custom training pipeline in
Vertex AI.
For an example of how to use the
CustomPythonPackageTrainingJob
class, see the tutorial in the Custom training using Python package, managed
text dataset, and TensorFlow serving
container
notebook.
CustomTrainingJob
Use the CustomTrainingJob class to launch a
custom training pipeline in Vertex AI with a script.
For an example of how to use the
CustomTrainingJob class, see the tutorial in
the
Custom training image classification model for online prediction with
explainability
notebook.
Hyperparameter training class
The Vertex AI SDK includes a class for hyperparameter tuning. Hyperparameter tuning maximizes your model's predictive accuracy by optimizing variables (known as hyperparameters) that govern the training process. For more information, see Overview of hyperparameter tuning.
HyperparameterTuningJob
Use the HyperparameterTuningJob class
to automate hyperparameter tuning on a training application.
To learn how to use the HyperparameterTuningJob class to create and tune a custom trained model, see the
Hyperparameter tuning tutorial on GitHub.
To learn how to use the HyperparameterTuningJob class to run a Vertex AI hyperparameter tuning job for a TensorFlow model, see the
Run hyperparameter tuning for a TensorFlow model tutorial on GitHub.
Pipeline training class
A pipeline orchestrates your ML workflow in Vertex AI. You can use a pipeline to automate, monitor, and govern your machine learning systems. To learn more about pipelines in Vertex AI, see Introduction to Vertex AI pipelines.
PipelineJob
An instance of the PipelineJob class represents a Vertex AI pipeline.
There are several tutorial notebooks that demonstrate how to use the PipelineJob class:
- To learn how to run a Kubeflow Pipelines (KFP) pipeline, see the Pipeline control structures using the KFP SDK tutorial on GitHub. 
- To learn how to train a scikit-learn tabular classification model and create a batch prediction job with a Vertex AI pipeline, see the Training and batch prediction with BigQuery source and destination for a custom tabular classification model tutorial on GitHub. 
- To learn how to build an AutoML image classification model and use a Vertex AI pipeline, see the AutoML image classification pipelines using google-cloud-pipeline-components tutorial on GitHub. 
For more tutorial notebooks, see Vertex AI notebook tutorials.
What's next
- Learn about the Vertex AI SDK.