Training classes

The Vertex AI SDK includes several classes that you use when you train your model. Most of the training classes are used to create, train, and return your model. Use the HyperparameterTuningJob to tune the training job's hyperparameters. Use the PipelineJob manage your machine learning (ML) workflow so you can automate and monitor your ML systems.

The following topics provide a high-level description of each training-related class in the Vertex AI SDK.

AutoML training classes for structured data

Vertex AI SDK includes the following classes that are used to train a structured AutoML model.

AutoMLForecastingTrainingJob

The AutoMLForecastingTrainingJob class uses the AutoML training method to train and run a forecasting model. The AutoML training method is a good choice for most forecasting use cases. If your use case doesn't benefit from the Seq2seq or the Temporal fusion transformer training method that the SequenceToSequencePlusForecastingTrainingJob and TemporalFusionTransformerForecastingTrainingJob classes offer respectively, then AutoML is likely the best training method for your forecasting predictions.

For sample code that shows you how to use AutoMLForecastingTrainingJob, see the Create a training pipeline forecasting sample on GitHub.

AutoMLTabularTrainingJob

The AutoMLTabularTrainingJob class represents a job that creates, trains, and returns an AutoML tabular model. For more information about training tabular models and Gemini Enterprise Agent Platform, see Tabular data and Tabular data overview.

The following sample code snippet shows how you might use the Vertex AI SDK to create and run an AutoML tabular model:

dataset = aiplatform.TabularDataset('projects/my-project/location/us-central1/datasets/{DATASET_ID}')

job = aiplatform.AutoMLTabularTrainingJob(
  display_name="train-automl",
  optimization_prediction_type="regression",
  optimization_objective="minimize-rmse",
)

model = job.run(
    dataset=dataset,
    target_column="target_column_name",
    training_fraction_split=0.6,
    validation_fraction_split=0.2,
    test_fraction_split=0.2,
    budget_milli_node_hours=1000,
    model_display_name="my-automl-model",
    disable_early_stopping=False,
)

SequenceToSequencePlusForecastingTrainingJob

The SequenceToSequencePlusForecastingTrainingJob class uses the Seq2seq+ training method to train and run a forecasting model. The Seq2seq+ training method is a good choice for experimentation. Its algorithm is simpler and uses a smaller search space than the AutoML option. Seq2seq+ is a good option if you want fast results and your datasets are smaller than 1 GB.

For sample code that shows you how to use SequenceToSequencePlusForecastingTrainingJob, see the Create a training pipeline forecasting Seq2seq sample on GitHub.

TemporalFusionTransformerForecastingTrainingJob

The TemporalFusionTransformerForecastingTrainingJob class uses the Temporal Fusion Transformer (TFT) training method to train and run a forecasting model. The TFT training method implements an attention-based deep neural network (DNN) model that uses a multi-horizon forecasting task to produce predictions.

For sample code that shows you how to use TemporalFusionTransformerForecastingTrainingJob, see the Create a training pipeline forecasting temporal fusion transformer sample on GitHub.

TimeSeriesDenseEncoderForecastingTrainingJob

The TimeSeriesDenseEncoderForecastingTrainingJob class uses the Time-series Dense Encoder (TiDE) training method to train and run a forecasting model. TiDE uses a multi-layer perceptron (MLP) to provide the speed of forecasting linear models with covariates and non-linear dependencies. For more information about TiDE, see Recent advances in deep long-horizon forecasting and this TiDE blog post.

AutoML training classes for unstructured data

The Vertex AI SDK includes the following classes to train unstructured image models:

AutoMLImageTrainingJob

Use the AutoMLImageTrainingJob class to create, train, and return an image model. For more information about working with image data models in Gemini Enterprise Agent Platform, see Image data.

For an example of how to use the AutoMLImageTrainingJob class, see the tutorial in the AutoML image classification notebook.

Custom data training classes

You can use the Vertex AI SDK to automate a custom training workflow. For information about using Gemini Enterprise Agent Platform to run custom training applications, see Custom training overview.

The Vertex AI SDK includes three classes that create a custom training pipeline. A training pipeline accepts an input Gemini Enterprise Agent Platform managed dataset that it uses to train a model. Next, it returns the model after the training job completes. Each of the three custom training pipeline classes creates a training pipeline differently. CustomTrainingJob uses a Python script, CustomContainerTrainingJob uses a custom container, and CustomPythonPackageTrainingJob uses a Python package and a prebuilt container.

The CustomJob class creates a custom training job but is not a pipeline. Unlike a custom training pipeline, the CustomJob class can use a dataset that's not a Gemini Enterprise Agent Platform managed dataset to train a model, and it doesn't return the trained model. Because the class accepts different types of datasets and doesn't return a trained model, it's less automated and more flexible than a custom training pipeline.

CustomContainerTrainingJob

Use the CustomContainerTrainingJob class to use a container to launch a custom training pipeline in Gemini Enterprise Agent Platform.

For an example of how to use the CustomContainerTrainingJob class, see the tutorial in the PyTorch Image Classification Multi-Node Distributed Data Parallel Training on GPU using Gemini Enterprise Agent Platform Training with Custom Container notebook.

CustomJob

Use the CustomJob class to use a script to launch a custom training job in Gemini Enterprise Agent Platform.

A training job is more flexible than a training pipeline because you aren't restricted to loading your data in a Gemini Enterprise Agent Platform managed dataset and a reference to your model isn't registered after the training job completes. For example, you might want to use the CustomJob class, its from_local_script method, and a script to load a dataset from scikit-learn or TensorFlow. Or, you might want to analyze or test your trained model before you register it to Gemini Enterprise Agent Platform.

For more information about custom training jobs, including requirements before submitting a custom training job, what a custom job includes, and a Python code sample, see Create custom training jobs.

Because the CustomJob.run doesn't return the trained model, you need to use a script to write the model artifact to a location, such as a Cloud Storage bucket. For more information, see Export a trained ML model.

The following sample code demonstrates how to create and run a custom job using a sample worker pool specification. The code writes the trained model to a Cloud Storage bucket named artifact-bucket.

# Create a worker pool spec that specifies a TensorFlow cassava dataset and
# includes the machine type and Docker image. The Google Cloud project ID
# is 'project-id'.
worker_pool_specs=[
     {
        "replica_count": 1,
        "machine_spec": { "machine_type": "n1-standard-8",
                          "accelerator_type": "NVIDIA_TESLA_V100",
                          "accelerator_count": 1
        },
        "container_spec": {"image_uri": "gcr.io/{project-id}/multiworker:cassava"}
      },
      {
        "replica_count": 1,
        "machine_spec": { "machine_type": "n1-standard-8",
                          "accelerator_type": "NVIDIA_TESLA_V100",
                          "accelerator_count": 1
        },
        "container_spec": {"image_uri": "gcr.io/{project-id}/multiworker:cassava"}
      }
]

# Use the worker pool spec to create a custom training job. The custom training
# job artifacts are stored in the Cloud Storage bucket
# named 'artifact-bucket'.
your_custom_training_job = aiplatform.CustomJob(
                                      display_name='multiworker-cassava-sdk',
                                      worker_pool_specs=worker_pool_specs,
                                      staging_bucket='gs://{artifact-bucket}')

# Run the training job. This method doesn't return the trained model.
my_multiworker_job.run()

CustomPythonPackageTrainingJob

Use the CustomPythonPackageTrainingJob class to use a Python package to launch a custom training pipeline in Gemini Enterprise Agent Platform.

For an example of how to use the CustomPythonPackageTrainingJob class, see the tutorial in the Custom training using Python package, managed text dataset, and TensorFlow serving container notebook.

CustomTrainingJob

Use the CustomTrainingJob class to launch a custom training pipeline in Gemini Enterprise Agent Platform with a script.

For an example of how to use the CustomTrainingJob class, see the tutorial in the Custom training image classification model for online prediction with explainability notebook.

Hyperparameter training class

The Vertex AI SDK includes a class for hyperparameter tuning. Hyperparameter tuning maximizes your model's predictive accuracy by optimizing variables (known as hyperparameters) that govern the training process. For more information, see Overview of hyperparameter tuning.

HyperparameterTuningJob

Use the HyperparameterTuningJob class to automate hyperparameter tuning on a training application.

To learn how to use the HyperparameterTuningJob class to create and tune a custom trained model, see the Hyperparameter tuning tutorial on GitHub.

To learn how to use the HyperparameterTuningJob class to run a Gemini Enterprise Agent Platform hyperparameter tuning job for a TensorFlow model, see the Run hyperparameter tuning for a TensorFlow model tutorial on GitHub.

Pipeline training class

A pipeline orchestrates your ML workflow in Gemini Enterprise Agent Platform. You can use a pipeline to automate, monitor, and govern your machine learning systems. To learn more about pipelines in Gemini Enterprise Agent Platform, see Introduction to Gemini Enterprise Agent Platform pipelines.

PipelineJob

An instance of the PipelineJob class represents a Gemini Enterprise Agent Platform pipeline.

There are several tutorial notebooks that demonstrate how to use the PipelineJob class:

For more tutorial notebooks, see Gemini Enterprise Agent Platform notebook tutorials.

What's next