GLM models

GLM models on Vertex AI offer fully managed and serverless models as APIs. To use a GLM model on Vertex AI, send a request directly to the Vertex AI API endpoint. Because GLM models use a managed API, there's no need to provision or manage infrastructure.

You can stream your responses to reduce the end-user latency perception. A streamed response uses server-sent events (SSE) to incrementally stream the response.

GLM 4.7

GLM 4.7 is a model from GLM designed for core or vibe coding, tool use, and complex reasoning.

Go to the GLM 4.7 model card

GLM 5

GLM 5 is a model from GLM targeting complex systems engineering and long-horizon agentic tasks.

Go to the GLM 5 model card

Use GLM models

For managed models, you can use curl commands to send requests to the Vertex AI endpoint using the following model names:

For GLM 4.7, use glm-4.7-maas
For GLM 5, use glm-5-maas

To learn how to make streaming and non-streaming calls to GLM models, see Call open model APIs.

To use a self-deployed Vertex AI model:

Navigate to the Model Garden console.
Find the relevant Vertex AI model.
Click Enable and complete the provided form to get the necessary commercial use licenses.

For more information about deploying and using partner models, see Deploy a partner model and make prediction requests .

GLM model region availability

GLM models are available in the following regions:

Model	Regions
GLM 4.7	`global` Max output: 128,000 Context length: 200,000
GLM 5	`global` Max output: 128,000 Context length: 200,000

What's next

Learn how to Call open model APIs.

GLM models Stay organized with collections Save and categorize content based on your preferences.