Tokens and Usage - NagaAI Documentation

Responses API gives you one of the clearest usage breakdowns in the docs. Use it to understand how much input, output, cached, and reasoning usage each request consumed.

Usage Shape

Every successful non-streaming response includes a usage object at the root level.

{
  "object": "response",
  "status": "completed",
  "output": [...],
  "usage": {
    "input_tokens": 125,
    "input_tokens_details": {
      "cached_tokens": 100
    },
    "output_tokens": 45,
    "output_tokens_details": {
      "reasoning_tokens": 15
    },
    "total_tokens": 170
  }
}

Field breakdown

input_tokens: The total number of tokens sent in the prompt, including images or documents.
input_tokens_details.cached_tokens: The portion of input_tokens that the upstream provider successfully cached from previous requests. These are billed at a discounted rate.
output_tokens: The total number of tokens generated by the model, including hidden “thinking” tokens.
output_tokens_details.reasoning_tokens: The portion of output_tokens that the model spent “thinking” before generating the visible answer. These are billed at the standard output rate.
total_tokens: The sum of all input and output tokens (input_tokens + output_tokens).

Streaming Usage

When you use stream: true, usage is not delivered token by token during the stream. It arrives with the final completed response snapshot. You do not need a separate chat-style stream_options flag on this surface.

event: response.completed
data: {"object":"response","status":"completed","output":[...],"usage":{"input_tokens":125,"output_tokens":45,"total_tokens":170,...}}

Practical Advice

When using models with high context windows, actively monitor input_tokens_details.cached_tokens. A high ratio of cached tokens means your prompt design is highly cost-efficient.
If you notice latency spikes or unexpected costs when using models that support reasoning effort, check output_tokens_details.reasoning_tokens to see how much budget the model spent planning its answer.
Always log the usage object in your application database alongside the request metadata. It is the most reliable way to attribute costs to specific features or users before running aggregate reports against /v1/account/activity.

Common mistakes

only logging final text and ignoring usage metadata
assuming streamed text deltas contain usage as they arrive
treating reasoning tokens and visible output tokens as the same optimization problem

​Usage Shape

​Field breakdown

​Streaming Usage

​Practical Advice

​Common mistakes

​Related Docs

Usage Shape

Field breakdown

Streaming Usage

Practical Advice

Common mistakes

Related Docs