Learn about applied research and engineering on Cloud AI
Featured article
Closing the efficiency gap in LLM serving with model co-hosting with Vertex AI
In the evolving landscape of large language models (LLMs), the "one model per machine" deployment pattern is becoming a significant bottleneck for LLM serving cost efficiency in enterprises. Model co-hosting addresses this efficiency gap by enabling multiple model instances to share the same virtual machine and GPU resources. This technical blog details Vertex AI Engineering's process in bringing model co-hosting to a production-ready cloud service.