Kimi models

Kimi models are available for use as managed APIs and self-deployed models on Vertex AI. You can stream your responses to reduce the end-user latency perception. A streamed response uses server-sent events (SSE) to incrementally stream the response.

Managed Kimi models

Kimi models offer fully managed and serverless models as APIs. To use a Kimi model on Vertex AI, send a request directly to the Vertex AI API endpoint. When using Kimi models as a managed API, there's no need to provision or manage infrastructure.

The following models are available from Kimi to use in Vertex AI. To access a Kimi model, go to its Model Garden model card.

Kimi K2 Thinking

Kimi K2 Thinking is a thinking model from Kimi that excels at complex problem-solving and deep reasoning.

Go to the Kimi K2 Thinking model card

Use Kimi models

For managed models, you can use curl commands to send requests to the Vertex AI endpoint using the following model names:

For Kimi K2 Thinking, use kimi-k2-thinking-maas

To learn how to make streaming and non-streaming calls to Kimi models, see Call open model APIs.

To use a self-deployed Vertex AI model:

Navigate to the Model Garden console.
Find the relevant Vertex AI model.
Click Enable and complete the provided form to get the necessary commercial use licenses.

For more information about deploying and using partner models, see Deploy a partner model and make prediction requests .

Kimi model region availability

Kimi models are available in the following regions:

Model	Regions
Kimi K2 Thinking	`global` Max output: 262,144 Context length: 262,144

What's next

Learn how to Call open model APIs.

Kimi models Stay organized with collections Save and categorize content based on your preferences.