Kimi K2 Thinking

Caution: As of July 21, 2026, the kimi-k2-thinking-maas endpoint is deprecated and will be retired on October 21, 2026. For more information, see Open model deprecations.

Kimi K2 Thinking is an open-source model that operates as a "thinking agent," reasoning step-by-step while using tools to achieve state-of-the-art performance on various benchmarks. It is capable of executing up to 200-300 sequential tool calls without human intervention, allowing it to solve complex problems across a wide range of tasks. The model uses Quantization-Aware Training (QAT) to support INT4 inference, which provides a roughly 2x improvement in generation speed.

Managed API (MaaS) specifications

View pricing

Model ID	`kimi-k2-thinking-maas`
Modalities	Text Input and output Image Not supported Audio Not supported Video Not supported
Capabilities	Function calling Supported Structured output Supported Thinking Supported
Consumption options	Provisioned Throughput Supported Batch inference Not supported Pay-as-you-go Standard PayGo Supported Fixed quota Not supported
Supported regions	Model availability	Global: `global`
Supported regions	ML processing	Multi-region: `us`
Quotas	`global`: 262,144 maximum output, 262,144 context length
Versions	`Kimi K2 Thinking` Launch stage: GA Release date: Nov 13, 2025

Deploy as a self-deployed model

To self-deploy the model, navigate to the Kimi K2 Thinking model card in the Model Garden console and click Deploy model. For more information about deploying and using partner models, see Deploy a partner model and make prediction requests.

Kimi K2 Thinking Stay organized with collections Save and categorize content based on your preferences.

Managed API (MaaS) specifications

Deploy as a self-deployed model

Kimi K2 Thinking