Reasoning
NagaAI supports models that return Reasoning Tokens, also known as thinking tokens. We normalize the different ways of customizing the amount of reasoning tokens that the model will use, providing a unified interface across different providers.
Reasoning tokens provide a transparent look into the reasoning steps taken by a model. Reasoning tokens are considered output tokens and charged accordingly.
Reasoning tokens are included in the response by default if the model decides to output them.
Reasoning Effort
Some models support configuring the size of the reasoning window. The following options are supported:
- "effort": "high"- Allocates a large portion of tokens for reasoning
- "effort": "medium"- Allocates a moderate portion of tokens
- "effort": "low"- Allocates a smaller portion of tokens
- "effort": "minimal"- Minimizes reasoning as much as possible; for models that support it, disables it.
Output Format
Models may return their reasoning process in two ways:
- reasoning_content: More modern models output their thoughts in a separate- reasoning_contentfield within the message chunk, for example:
{
  "role": "assistant",
  "reasoning_content": "User asks..."
}
- <think>tag: Some models return the reasoning process within the well-known- contentfield, wrapping the thoughts in a- <think>...</think>block, for example:
{
  "role": "assistant",
  "content": "<think> User asks to..."
}
Some models, like Gemini, never return their chain of thought - this information is not available to the end user.
In the future, we plan to standardize reasoning to a single format. For now, model thoughts may be returned in these ways.