To run artificial intelligence (AI), machine learning (ML), or high performance computing (HPC) workloads, you can deploy AI-optimized VMs and clusters of A4X, A4, and A3 Ultra machines. For more information about the features of these machines that enable you to run large-scale AI/ML clusters, see Cluster management overview.
You can create A4X, A4, and A3 Ultra VMs directly from Compute Engine, or through other services that run on Compute Engine instances like Cluster Toolkit or Google Kubernetes Engine.
For the most appropriate option to create your VMs or clusters for your use case, choose one of the following:
| Option | Use case |
|---|---|
| Cluster Director | You want a fully managed service that automates the setup and configuration of your Slurm clusters. Cluster Director helps you configure compute, networking, and storage resources for your clusters to maximize performance and minimize downtimes. To learn more, see Create an AI-optimized cluster based on a template. |
| Cluster Toolkit | You want to use open-source software that simplifies the process for you to deploy both Slurm and GKE clusters. Cluster Toolkit is designed to be highly customizable and extensible. To learn more, see the following: |
| GKE | You want maximum flexibility in configuring your Google Kubernetes Engine cluster based on the needs of your workload. To learn more, see Create a custom AI-optimized Google Kubernetes Engine cluster. |
| Use Compute Engine | You want full control of the infrastructure layer so that you can set up your own orchestrator. To learn more, see the following:
|