Skip to main content

Reasoning

NagaAI supports models that return Reasoning Tokens, also known as thinking tokens. We normalize the different ways of customizing the amount of reasoning tokens that the model will use, providing a unified interface across different providers.

Reasoning tokens provide a transparent look into the reasoning steps taken by a model. Reasoning tokens are considered output tokens and charged accordingly.

Reasoning tokens are included in the response by default if the model decides to output them.

Reasoning Effort

Some models support configuring the size of the reasoning window. The following options are supported:

  • "effort": "high" - Allocates a large portion of tokens for reasoning
  • "effort": "medium" - Allocates a moderate portion of tokens
  • "effort": "low" - Allocates a smaller portion of tokens
  • "effort": "minimal" - Minimizes reasoning as much as possible; for models that support it, disables it.

Output Format

Models may return their reasoning process in two ways:

  1. reasoning_content: More modern models output their thoughts in a separate reasoning_content field within the message chunk, for example:
{
"role": "assistant",
"reasoning_content": "User asks..."
}
  1. <think> tag: Some models return the reasoning process within the well-known content field, wrapping the thoughts in a <think>...</think> block, for example:
{
"role": "assistant",
"content": "<think> User asks to..."
}
note

Some models, like Gemini, never return their chain of thought - this information is not available to the end user.

info

In the future, we plan to standardize reasoning to a single format. For now, model thoughts may be returned in these ways.