Kimi K2 Thinking is an open-source model that operates as a "thinking agent," reasoning step-by-step while using tools to achieve state-of-the-art performance on various benchmarks. It is capable of executing up to 200-300 sequential tool calls without human intervention, allowing it to solve complex problems across a wide range of tasks. The model uses Quantization-Aware Training (QAT) to support INT4 inference, which provides a roughly 2x improvement in generation speed.
Managed API (MaaS) specifications
View model card in Model Garden
| Model ID | kimi-k2-thinking-maas |
|
|---|---|---|
| Launch stage | GA | |
| Supported inputs & outputs |
|
|
| Capabilities |
|
|
| Usage types |
|
|
| Versions |
|
|
| Supported regions | ||
|
Model availability |
|
|
|
ML processing |
|
|
| Limits |
global:
|
|
| Pricing | See Pricing. | |
Deploy as a self-deployed model
To self-deploy the model, navigate to the Kimi K2 Thinking model card in the Model Garden
console and click Deploy model. For more information about deploying and
using partner models, see Deploy a partner model and make prediction
requests.