Use GPUs to run AI inference on Cloud Run. If you are new to AI concepts, see GPUs for AI. GPUs are used to train and run AI models. This can give you more stable performance with the ability to scale workloads depending on your overall utilization. See GPU support for services, jobs, and worker pools to learn more about GPU configurations.
Tutorials for services
- Run LLM inference on Cloud Run GPUs with Gemma 3 and Ollama
- Run Gemma 3 on Cloud Run
- Run LLM inference on Cloud Run GPUs with vLLM
- Run OpenCV on Cloud Run with GPU acceleration
- Run LLM inference on Cloud Run GPUs with Hugging Face Transformers.js
- Run LLM inference on Cloud Run GPUs with Hugging Face TGI
Tutorials for jobs
- Fine tune LLMs using GPUs with Cloud Run jobs
- Run batch inference using GPUs on Cloud Run jobs
- GPU-accelerated video transcoding with FFmpeg on Cloud Run jobs