About GPU instances

This document describes the features and limitations of GPU virtual machine (VM) instances that run on Compute Engine.

To accelerate specific workloads on Compute Engine, you can either deploy an accelerator-optimized instance that has attached GPUs, or attach GPUs to an N1 general-purpose instance. Compute Engine provides GPUs for your instances in pass-through mode. Pass-through mode provides your instances with direct control over GPUs and their memory.

You can also use some GPU machine types on AI Hypercomputer. AI Hypercomputer is a supercomputing system that is optimized to support your artificial intelligence (AI) and machine learning (ML) workloads. This option is recommended for creating a densely allocated, performance-optimized infrastructure that has integrations for Google Kubernetes Engine (GKE) and Slurm schedulers.

Supported machine types

Compute Engine offers different machine types to support your various workloads.

Some machine types support NVIDIA RTX Virtual Workstations (vWS). When you create an instance that uses NVIDIA RTX Virtual Workstation, Compute Engine automatically adds a vWS license. For information about pricing for virtual workstations, see the GPU pricing page.

GPU machine types
AI and ML workloads	Graphics and visualization	Other GPU workloads
Accelerator-optimized A series machine types are designed for high performance computing (HPC), artificial intelligence (AI), and machine learning (ML) workloads. The later generation A series are ideal for pre-training and fine-tuning foundation models that involves large clusters of accelerators, while the A2 series can be used for training smaller models and single host inference. For these machine types, the GPU model is automatically attached to the instance.	Accelerator-optimized G series machine types are designed for workloads such as NVIDIA Omniverse simulation workloads, graphics-intensive applications, video transcoding, and virtual desktops. These machine types support NVIDIA RTX Virtual Workstations (vWS). The G series can also be used for training smaller models and for single-host inference. For these machine types, the GPU model is automatically attached to the instance.	For N1 general-purpose machine types, except for the N1 shared-core (`f1-micro` and `g1-small`), you can attach a select set of GPU models. Some of these GPU models also support NVIDIA RTX Virtual Workstations (vWS).
A4X (NVIDIA GB200 Superchips) (`nvidia-gb200`) A4 (NVIDIA B200) (`nvidia-b200`) A3 Ultra (NVIDIA H200) (`nvidia-h200-141gb`) A3 Mega (NVIDIA H100) (`nvidia-h100-mega-80gb`) A3 High (NVIDIA H100) (`nvidia-h100-80gb`) A3 Edge (NVIDIA H100) (`nvidia-h100-80gb`) A2 Ultra (NVIDIA A100 80GB) (`nvidia-a100-80gb`) A2 Standard (NVIDIA A100) (`nvidia-a100-40gb`)	G4 (NVIDIA RTX PRO 6000) (`nvidia-rtx-pro-6000`) (`nvidia-rtx-pro-6000-vws`) G2 (NVIDIA L4) (`nvidia-l4`) (`nvidia-l4-vws`)	The following GPU models can be attached to N1 general-purpose machine types: NVIDIA T4 (`nvidia-tesla-t4`) (`nvidia-tesla-t4-vws`) NVIDIA P4 (`nvidia-tesla-p4`) (`nvidia-tesla-p4-vws`) NVIDIA V100 (`nvidia-tesla-v100`) NVIDIA P100 (`nvidia-tesla-p100`) (`nvidia-tesla-p100-vws`)

GPU machine types

AI and ML workloads Graphics and visualization Other GPU workloads

Accelerator-optimized A series machine types are designed for high performance computing (HPC), artificial intelligence (AI), and machine learning (ML) workloads.

The later generation A series are ideal for pre-training and fine-tuning foundation models that involves large clusters of accelerators, while the A2 series can be used for training smaller models and single host inference.

For these machine types, the GPU model is automatically attached to the instance.

Accelerator-optimized G series machine types are designed for workloads such as NVIDIA Omniverse simulation workloads, graphics-intensive applications, video transcoding, and virtual desktops. These machine types support NVIDIA RTX Virtual Workstations (vWS).

The G series can also be used for training smaller models and for single-host inference.

For these machine types, the GPU model is automatically attached to the instance.

For N1 general-purpose machine types, except for the N1 shared-core (f1-micro and g1-small), you can attach a select set of GPU models. Some of these GPU models also support NVIDIA RTX Virtual Workstations (vWS).

A4X (NVIDIA GB200 Superchips)
(nvidia-gb200)
A4 (NVIDIA B200)
(nvidia-b200)
A3 Ultra (NVIDIA H200)
(nvidia-h200-141gb)
A3 Mega (NVIDIA H100)
(nvidia-h100-mega-80gb)
A3 High (NVIDIA H100)
(nvidia-h100-80gb)
A3 Edge (NVIDIA H100)
(nvidia-h100-80gb)
A2 Ultra (NVIDIA A100 80GB)
(nvidia-a100-80gb)
A2 Standard (NVIDIA A100)
(nvidia-a100-40gb)

G4 (NVIDIA RTX PRO 6000)
(nvidia-rtx-pro-6000)
(nvidia-rtx-pro-6000-vws)
G2 (NVIDIA L4)
(nvidia-l4)
(nvidia-l4-vws)

The following GPU models can be attached to N1 general-purpose machine types:

NVIDIA T4
(nvidia-tesla-t4)
(nvidia-tesla-t4-vws)
NVIDIA P4
(nvidia-tesla-p4)
(nvidia-tesla-p4-vws)
NVIDIA V100
(nvidia-tesla-v100)
NVIDIA P100
(nvidia-tesla-p100)
(nvidia-tesla-p100-vws)

GPUs on Spot VMs

You can add GPUs to your Spot VMs at lower spot prices for the GPUs. GPUs attached to Spot VMs work like normal GPUs but persist only for the life of the VM. Spot VMs with GPUs follow the same preemption process as all Spot VMs.

Consider requesting dedicated Preemptible GPU quota to use for GPUs on Spot VMs. For more information, see Quotas for Spot VMs.

During maintenance events, Spot VMs with GPUs are preempted by default and cannot be automatically restarted. If you want to recreate your VMs after they have been preempted, use a managed instance group. Managed instance groups recreate your VM instances if the vCPU, memory, and GPU resources are available.

If you want a warning before your VMs are preempted, or want to configure your VMs to automatically restart after a maintenance event, use standard VMs with a GPU. For standard VMs with GPUs, Compute Engine provides one hour advance notice before preemption.

Compute Engine does not charge you for GPUs if their VMs are preempted in the first minute after they start running.

To learn how to create Spot VMs with GPUs attached, read Create a VM with attached GPUs and Creating Spot VMs. For example, see Create an A3 Ultra or A4 instance using Spot VMs.

GPUs on instances with predefined run times

Instances that use the standard provisioning model typically can't use preemptible allocation quotas. Preemptible quotas are for temporary workloads and are usually more available. If your project doesn't have preemptible quota, and you have never requested it, then all instances in your project consume standard allocation quotas.

If you request preemptible allocation quota, then instances that use the standard provisioning model must meet all of the following criteria to consume preemptible allocation quota:

The instances have GPUs attached.
The instances are configured to be automatically deleted after a predefined run time through the maxRunDuration or terminationTime field. For more information, see the following:
- Limit the run time of an instance
- Limit the run time of instances in a MIG
The instance isn't allowed to consume reservations. For more information, see Prevent compute instances from consuming reservations.

When you consume preemptible allocation for time-bound GPU workloads, you can benefit from both uninterrupted run time and the high obtainability of preemptible allocation quota. For more information, see Preemptible quotas.

GPUs and Confidential VM

You can use a GPU with a Confidential VM instance that uses Intel TDX on A3 machine series. For more information, see Confidential VM supported configurations. To learn how to create a Confidential VM instance with GPUs, see Create a Confidential VM instance with GPU.

GPUs and block storage

When you create an instance by using a GPU machine type, you can add persistent or temporary block storage to the instance. To store non-transient data, use persistent block storage like Hyperdisk or Persistent Disk because these disks are independent of the instance's lifecycle. Data on persistent storage can be retained even after you delete the instance.

For temporary scratch storage or caches, use temporary block storage by adding Local SSD disks when you create the instance.

Persistent block storage with Persistent Disk and Hyperdisk volumes

You can attach Persistent Disk and select Hyperdisk volumes to GPU-enabled instances.

For machine learning (ML) and serving workloads, use Hyperdisk ML volumes, which offer high throughput and shorter data load times. Hyperdisk ML is a more cost-effective option for ML workloads because it offers lower GPU idle times.

Hyperdisk ML volumes provide read-only multi-attach support, so you can attach the same disk to multiple instances, giving each instance access to the same data.

For more information about the supported disk types for machine series that support GPUs, see the N1 and accelerator optimized machine series pages.

Local SSD disks

Local SSD disks provide fast, temporary storage for caching, data processing, or other transient data. Local SSD disks provide fast storage because they are physically attached to the server that hosts your instance. Local SSD disks provide temporary storage because the instance loses data if it restarts.

Avoid storing data with strong persistency requirements on Local SSD disks. To store non-transient data, use persistent storage instead.

If you manually stop an instance with a GPU, you can preserve the Local SSD data, with certain restrictions. See the Local SSD documentation for more details.

For regional support for Local SSD with GPU types, see Local SSD availability.

GPUs and host maintenance

Compute Engine always stops instances with attached GPUs when it performs maintenance events on the host server. If the instance has attached Local SSD disks, the instance loses the Local SSD data after it stops.

For information on handling maintenance events, see Handling GPU host maintenance events.

Reserve GPU capacity

Reservations provide high assurance of capacity for zone-specific resources, including GPUs. You can use reservations to ensure that you have GPUs available when you need to use them for performance-intensive applications. For the different methods to reserve zone-specific resources in Compute Engine, see Choose a reservation type.

Reservations are also required when you want to receive committed use discounts (CUDs) for your GPUs.

GPU pricing

If you request Compute Engine to provision GPUs using the spot, flex-start, or reservation-bound provisioning model, then you get the GPUs at discounted prices, depending on the GPU type. You can also receive committed use discounts or sustained use discounts (only with N1 VMs) for your GPU usage.

For hourly and monthly pricing for GPUs, see GPU pricing page.

Committed use discounts for GPUs

Resource-based commitments provide deep discounts for Compute Engine resources in return for committing to using the resources in a specific region for at least one year. You typically purchase commitments for resources such as vCPUs, memory, GPUs, and Local SSD disks for use with a specific machine series. When you use your resources, you receive qualifying resource usage at discounted prices. To learn more about these discounts, see Resource-based committed use discounts.

To purchase a commitment with GPUs, you must also reserve the GPUs and attach the reservations to your commitment. For more information about attaching reservations to commitments, see Attach reservations to resource-based commitments.

Sustained use discounts for GPUs

Instances that use N1 machine types with attached GPUs receive sustained use discounts (SUDs), similar to vCPUs. When you select a GPU for a virtual workstation, Compute Engine automatically adds an NVIDIA RTX Virtual Workstation license to your instance.

GPU restrictions and limitations

For instances with attached GPUs, the following restrictions and limitations apply:

Only accelerator-optimized (A4X, A4, A3, A2, G4, and G2) and general-purpose N1 machine types support GPUs.
To protect Compute Engine systems and users, new projects have a global GPU quota that limits the total number of GPUs you can create in any supported zone. When you request a GPU quota, you must request a quota for the GPU models that you want to create in each region, and an additional global quota for the total number of GPUs of all types in all zones.
Instances with one or more GPUs have a maximum number of vCPUs for each GPU that you add to the instance. To see the available vCPU and memory ranges for different GPU configurations, see the GPUs list.
GPUs require device drivers to function properly. NVIDIA GPUs that run on Compute Engine must use a minimum driver version. For more information about driver versions, see Required NVIDIA driver versions.
The Compute Engine SLA covers instances with an attached GPU model only if that attached GPU model is generally available.

For regions that have multiple zones, the Compute Engine SLA covers the instance only if the GPU model is available in more than one zone within that region. For GPU models by region, see Accelerator availability.
Compute Engine supports one concurrent user per GPU.
Also see the limitations for each machine type with attached GPUs.

What's next?

Learn how to create instances with attached GPUs.
Learn how to add or remove GPUs.
Learn how to create a Confidential VM instance with an attached GPU.