Compute resources

If you're interested in Vertex AI Managed Training, contact your sales representative for access.

Managed Training supports a variety of machine types to accommodate different workloads. You can choose from the following options when configuring your cluster node pools:

  • a4-highgpu-8g
  • a3-ultragpu-8g
  • a3-megagpu-8g
  • n2 CPU family

Capacity provisioning

Choosing the right provisioning model is critical for balancing cost, speed, and resource availability. See the following provisioning options:

  • RESERVATION: Allocates nodes from a specific Compute Engine reservation that you've created in advance. This model ensures capacity and is the recommended choice for high-demand resources.

  • FLEX_START: Utilizes the Dynamic Workload Scheduler to queue your job. The job begins automatically as soon as the requested compute resources become available, offering a flexible start time without requiring a reservation.

  • SPOT: Provisions the node pool using Spot VMs. This is the most cost-effective option, but it should only be used for workloads that are fault-tolerant and can handle interruptions, as the VMs may be preempted at any time.

  • ON_DEMAND: This is the default option for CPU-only node pools and is best suited for machine types that are not scarce. It provides standard VM instances with predictable, pay-as-you-go pricing.

Use the following guidance to make your selection:

  • For high-demand GPU resources (like A3 and A4): The RESERVATION model is strongly recommended. It ensures you have dedicated access to the capacity you need for critical training jobs.

  • For bursty or flexible workloads: Consider FLEX_START or SPOT. FLEX_START queues your job until resources are available, while SPOT offers significant cost savings for fault-tolerant jobs that can handle preemption.

  • For abundant machine types: The ON_DEMAND model is the preferred choice. Use it for machine types that are not scarce and where immediate availability isn't a concern.

Using a shared reservation (optional)

If you'd like to use a shared reservation, rather than a local reservation, there are additional steps to take before you can create a cluster.

Before using a shared reservation in Managed Training, make sure the shared reservation works by manually creating a VM that uses the shared reservation. If this VM creation works, move on to the next step. In the cluster creation configuration, use the reservation name in the following format: projects/RESERVATION_HOST_PROJECT_ID/zones/RESERVATION_ZONE/reservations/RESERVATION_NAME.