When you perform custom training, you must specify what machine learning (ML) code you want Vertex AI to run. To do this, configure training container settings for either a custom container or a Python training application that runs on a prebuilt container.
To determine whether you want to use a custom container or a prebuilt container, read Training code requirements.
This document describes the fields of the Vertex AI API that you must specify in either of the preceding cases.
Where to specify container settings
Specify configuration details within a
WorkerPoolSpec. Depending on how
you perform custom training, put this WorkerPoolSpec in one of the following
API fields:
If you are creating a
CustomJobresource, specify theWorkerPoolSpecinCustomJob.jobSpec.workerPoolSpecs.If you are using the Google Cloud CLI, then you can use the
--worker-pool-specflag or the--configflag on thegcloud ai custom-jobs createcommand to specify worker pool options.Learn more about creating a
CustomJob.If you are creating a
HyperparameterTuningJobresource, specify theWorkerPoolSpecinHyperparameterTuningJob.trialJobSpec.workerPoolSpecs.If you are using the gcloud CLI, then you can use the
--configflag on thegcloud ai hpt-tuning-jobs createcommand to specify worker pool options.Learn more about creating a
HyperparameterTuningJob.If you are creating a
TrainingPipelineresource without hyperparameter tuning, specify theWorkerPoolSpecinTrainingPipeline.trainingTaskInputs.workerPoolSpecs.Learn more about creating a custom
TrainingPipeline.If you are creating a
TrainingPipelinewith hyperparameter tuning, specify theWorkerPoolSpecinTrainingPipeline.trainingTaskInputs.trialJobSpec.workerPoolSpecs.
If you are performing distributed training, you can use different settings for each worker pool.
Configure container settings
Depending on whether you are using a prebuilt container or a custom container,
you must specify different fields within the WorkerPoolSpec. Select the tab for your scenario:
Prebuilt container
Select a prebuilt container that supports the ML framework you plan to use for training. Specify one of the container image's URIs in the
pythonPackageSpec.executorImageUrifield.Specify the Cloud Storage URIs of your Python training application in the
pythonPackageSpec.packageUrisfield.Specify your training application's entry point module in the
pythonPackageSpec.pythonModulefield.Optionally, specify a list of command-line arguments to pass to your training application's entry point module in the
pythonPackageSpec.argsfield.
The following examples highlight where you specify these container settings
when you create a CustomJob:
Console
In the Google Cloud console, you can't create a CustomJob directly. However,
you can create a TrainingPipeline that creates a
CustomJob. When you create a
TrainingPipeline in the Google Cloud console, you can specify prebuilt
container settings in certain fields on the Training container step:
pythonPackageSpec.executorImageUri: Use the Model framework and Model framework version drop-down lists.pythonPackageSpec.packageUris: Use the Package location field.pythonPackageSpec.pythonModule: Use the Python module field.pythonPackageSpec.args: Use the Arguments field.
gcloud
gcloud ai custom-jobs create \
--region=LOCATION \
--display-name=JOB_NAME \
--python-package-uris=PYTHON_PACKAGE_URIS \
--worker-pool-spec=machine-type=MACHINE_TYPE,replica-count=REPLICA_COUNT,executor-image-uri=PYTHON_PACKAGE_EXECUTOR_IMAGE_URI,python-module=PYTHON_MODULE
For more context, read the guide to creating a
CustomJob.
Custom container
Specify the Artifact Registry or Docker Hub URI of your custom container in the
containerSpec.imageUrifield.Optionally, if you want to override the
ENTRYPOINTorCMDinstructions in your container, specify thecontainerSpec.commandorcontainerSpec.argsfields. These fields affect how your container runs according to the following rules:If you specify neither field: Your container runs according to its
ENTRYPOINTinstruction andCMDinstruction (if it exists). Refer to the Docker documentation about howCMDandENTRYPOINTinteract.If you specify only
containerSpec.command: Your container runs with the value ofcontainerSpec.commandreplacing itsENTRYPOINTinstruction. If the container has aCMDinstruction, it is ignored.If you specify only
containerSpec.args: Your container runs according to itsENTRYPOINTinstruction, with the value ofcontainerSpec.argsreplacing itsCMDinstruction.If you specify both fields: Your container runs with
containerSpec.commandreplacing itsENTRYPOINTinstruction andcontainerSpec.argsreplacing itsCMDinstruction.
The following example highlights where you can specify some of these
container settings when you create a CustomJob:
Console
In the Google Cloud console, you can't create a CustomJob directly. However,
you can create a TrainingPipeline that creates a
CustomJob. When you create a
TrainingPipeline in the Google Cloud console, you can specify custom
container settings in certain fields on the Training container step:
containerSpec.imageUri: Use the Container image field.containerSpec.command: This API field is not configurable in the Google Cloud console.containerSpec.args: Use the Arguments field.
gcloud
gcloud ai custom-jobs create \
--region=LOCATION \
--display-name=JOB_NAME \
--worker-pool-spec=machine-type=MACHINE_TYPE,replica-count=REPLICA_COUNT,container-image-uri=CUSTOM_CONTAINER_IMAGE_URI
Java
Before trying this sample, follow the Java setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Java API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Node.js
Before trying this sample, follow the Node.js setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Node.js API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Python
To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.
For more context, read the guide to creating a
CustomJob.
What's next
- Learn how to perform custom training by creating a
CustomJob.