NagaAI supports models that return Reasoning Tokens, also known as thinking tokens. We normalize the different ways of customizing the amount of reasoning tokens that the model will use, providing a unified interface across different providers.
Reasoning tokens provide a transparent look into the reasoning steps taken by a model. Reasoning tokens are considered output tokens and charged accordingly.
Reasoning tokens are included in the response by default if the model decides to output them.
Reasoning Effort
Some models support configuring the size of the reasoning window. The following options are supported:
"effort": "xhigh" - Allocates the maximum reasoning budget (when supported)
"effort": "high" - Allocates a large portion of tokens for reasoning
"effort": "medium" - Allocates a moderate portion of tokens
"effort": "low" - Allocates a smaller portion of tokens
"effort": "minimal" - Minimizes reasoning as much as possible
"effort": "none" - Disables reasoning (when supported)
Models may return their reasoning process in two ways:
reasoning_content: More modern models output their thoughts in a separate reasoning_content field within the message chunk, for example:
{
"role": "assistant",
"reasoning_content": "User asks..."
}
<think> tag: Some models return the reasoning process within the well-known content field, wrapping the thoughts in a <think>...</think> block, for example:
{
"role": "assistant",
"content": "<think> User asks to..."
}
Some models, like Gemini, never return their chain of thought - this
information is not available to the end user.
In the future, we plan to standardize reasoning to a single format. For now,
model thoughts may be returned in these ways.
Preserving Reasoning Blocks
When using tool calling, you can preserve the model’s reasoning blocks by passing reasoning_details back in the conversation context. This maintains reasoning continuity when the model pauses to await tool results and then continues building its response.
Model Support: Supported by Gemini, Z.AI, and other models.
We highly recommend implementing this mechanism. The list of models whose performance depends on chain-of-thought is constantly growing, and preserving reasoning context is becoming a new standard.
Why preserve reasoning blocks?
- Reasoning continuity: The reasoning blocks capture the model’s step-by-step thinking that led to tool requests. Including them when posting tool results allows the model to continue from where it left off.
- Context maintenance: While tool results appear as user messages in the API, they’re part of a continuous reasoning flow. Preserving reasoning blocks maintains this conceptual flow across multiple API calls.
When providing reasoning_details blocks, you must pass the entire sequence of consecutive reasoning blocks exactly as generated by the model - you cannot rearrange or modify them.
from openai import OpenAI
client = OpenAI(
base_url="https://api.naga.ac/v1",
api_key="YOUR_API_KEY",
)
tools = [
{
"type": "function",
"function": {
"name": "get_transport_schedule",
"description": "Get public transport schedule for a given city",
"parameters": {
"type": "object",
"properties": {"city": {"type": "string"}},
"required": ["city"],
},
},
}
]
# First API call with tools
response = client.chat.completions.create(
model="gemini-3-pro-preview",
messages=[
{
"role": "user",
"content": "When does the next subway leave in Boston? Also, suggest a good cafe nearby.",
}
],
tools=tools,
)
message = response.choices[0].message
# Preserve reasoning_details when passing back
messages = [
{
"role": "user",
"content": "When does the next subway leave in Boston? Also, suggest a good cafe nearby.",
},
{
"role": "assistant",
"content": message.content,
"tool_calls": message.tool_calls,
"reasoning_details": message.reasoning_details, # Pass back unmodified
},
{
"role": "tool",
"tool_call_id": message.tool_calls[0].id,
"content": '{"next_departure": "12:15 PM", "line": "Red Line", "destination": "Alewife"}',
},
]
# Second API call - model continues reasoning from where it left off
response2 = client.chat.completions.create(
model="gemini-3-pro-preview",
messages=messages,
tools=tools
)