Reasoning Effort
Some models support configuring the size of the reasoning window. The following options are supported:"reasoning_effort": "none"- Disables reasoning (when supported)"reasoning_effort": "minimal"- Minimizes reasoning as much as possible"reasoning_effort": "low"- Allocates a smaller portion of tokens for reasoning"reasoning_effort": "medium"- Allocates a moderate portion of tokens"reasoning_effort": "high"- Allocates a large portion of tokens for reasoning"reasoning_effort": "xhigh"- Allocates the maximum reasoning budget (when supported)
Output Format
Models may return their reasoning process in two ways:reasoning_content: More modern models output their thoughts in a separatereasoning_contentfield within the message chunk, for example:
<think>tag: Some models return the reasoning process within the well-knowncontentfield, wrapping the thoughts in a<think>...</think>block, for example:
Some models, like Gemini, never return their chain of thought - this
information is not available to the end user.
In the future, we plan to standardize reasoning to a single format. For now,
model thoughts may be returned in these ways.
Preserving Reasoning Blocks
When using tool calling, you can preserve the model’s reasoning blocks by passingreasoning_details back in the conversation context. This maintains reasoning continuity when the model pauses to await tool results and then continues building its response.
Model Support: Supported by Gemini, Z.AI, and other models.
We highly recommend implementing this mechanism. The list of models whose performance depends on chain-of-thought is constantly growing, and preserving reasoning context is becoming a new standard.
- Reasoning continuity: The reasoning blocks capture the model’s step-by-step thinking that led to tool requests. Including them when posting tool results allows the model to continue from where it left off.
- Context maintenance: While tool results appear as user messages in the API, they’re part of a continuous reasoning flow. Preserving reasoning blocks maintains this conceptual flow across multiple API calls.