GLM models on Vertex AI offer fully managed and serverless models as APIs. To use a GLM model on Vertex AI, send a request directly to the Vertex AI API endpoint. Because GLM models use a managed API, there's no need to provision or manage infrastructure.
You can stream your responses to reduce the end-user latency perception. A streamed response uses server-sent events (SSE) to incrementally stream the response.
GLM 4.7
GLM 4.7 is a model from GLM designed for core or vibe coding, tool use, and complex reasoning.
Go to the GLM 4.7 model cardGLM 5
GLM 5 is a model from GLM targeting complex systems engineering and long-horizon agentic tasks.
Go to the GLM 5 model cardUse GLM models
For managed models, you can use curl commands to send requests to the Vertex AI endpoint using the following model names:
- For GLM 4.7, use
glm-4.7-maas - For GLM 5, use
glm-5-maas
To learn how to make streaming and non-streaming calls to GLM models, see Call open model APIs.
To use a self-deployed Vertex AI model:
- Navigate to the Model Garden console.
- Find the relevant Vertex AI model.
- Click Enable and complete the provided form to get the necessary commercial use licenses.
For more information about deploying and using partner models, see Deploy a partner model and make prediction requests .
GLM model region availability
GLM models are available in the following regions:
| Model | Regions |
|---|---|
| GLM 4.7 |
|
| GLM 5 |
|
What's next
Learn how to Call open model APIs.