Grok models support a "thinking" mode that allows them to perform step-by-step reasoning before providing a final answer. This is useful for tasks requiring transparent logic, like mathematical proofs, intricate code debugging, or multi-step agent planning.
Grok guidance
- The reasoning token count is outputted in the
reasoning_tokensfield, separated from thecompletion_tokens. - The response doesn't have the
reasoning_contentfield. All response texts are outputted in thecontentfield. - Grok reasoning models don't support
reasoning_effort.
Example request:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://aiplatform.googleapis.com/v1/projects/test-project/locations/global/endpoints/openapi/chat/completions -d '{
"model": "xai/grok-4.1-fast-reasoning",
"messages": [{
"role": "user",
"content": "Who are you?"
}],
}'
Example response:
{
"choices": [
{
"finish_reason": "stop",
"index": 0,
"logprobs": null,
"message": {
"content": "I am Grok, an AI assistant built by xAI...",
"role": "assistant"
}
}
],
"created": 1775523905,
"id": "knTMaJC0EJfM5OMP7I3xkAk",
"model": "xai/grok-4.1-fast-reasoning",
"object":"chat.completion",
"system_fingerprint":"fp_39c5j0a324",
"usage":{
"completion_tokens":50,
"completion_tokens_details":{
"accepted_prediction_tokens":0,
"audio_tokens":0,
"reasoning_tokens":124,
"rejected_prediction_tokens":0
},
"cost_in_usd_ticks":0,
"num_sources_used":0,
"prompt_tokens":663,
"prompt_tokens_details":{"audio_tokens":0,"cached_tokens":654,"image_tokens":0,"text_tokens":663},
"total_tokens":837
}
}
What's next
- Learn about Function calling.
- Learn about Structured output.