Skip to main content
Responses API gives you one of the clearest usage breakdowns in the docs. Use it to understand how much input, output, cached, and reasoning usage each request consumed.

Usage Shape

Every successful non-streaming response includes a usage object at the root level.
{
  "object": "response",
  "status": "completed",
  "output": [...],
  "usage": {
    "input_tokens": 125,
    "input_tokens_details": {
      "cached_tokens": 100
    },
    "output_tokens": 45,
    "output_tokens_details": {
      "reasoning_tokens": 15
    },
    "total_tokens": 170
  }
}

Field breakdown

  • input_tokens: The total number of tokens sent in the prompt, including images or documents.
  • input_tokens_details.cached_tokens: The portion of input_tokens that the upstream provider successfully cached from previous requests. These are billed at a discounted rate.
  • output_tokens: The total number of tokens generated by the model, including hidden “thinking” tokens.
  • output_tokens_details.reasoning_tokens: The portion of output_tokens that the model spent “thinking” before generating the visible answer. These are billed at the standard output rate.
  • total_tokens: The sum of all input and output tokens (input_tokens + output_tokens).

Streaming Usage

When you use stream: true, usage is not delivered token by token during the stream. It arrives with the final completed response snapshot. You do not need a separate chat-style stream_options flag on this surface.
event: response.completed
data: {"object":"response","status":"completed","output":[...],"usage":{"input_tokens":125,"output_tokens":45,"total_tokens":170,...}}

Practical Advice

  • When using models with high context windows, actively monitor input_tokens_details.cached_tokens. A high ratio of cached tokens means your prompt design is highly cost-efficient.
  • If you notice latency spikes or unexpected costs when using models that support reasoning effort, check output_tokens_details.reasoning_tokens to see how much budget the model spent planning its answer.
  • Always log the usage object in your application database alongside the request metadata. It is the most reliable way to attribute costs to specific features or users before running aggregate reports against /v1/account/activity.

Common mistakes

  • only logging final text and ignoring usage metadata
  • assuming streamed text deltas contain usage as they arrive
  • treating reasoning tokens and visible output tokens as the same optimization problem