Responses API gives you one of the clearest usage breakdowns in the docs.
Use it to understand how much input, output, cached, and reasoning usage each request consumed.
Usage Shape
Every successful non-streaming response includes ausage object at the root level.
Field breakdown
input_tokens: The total number of tokens sent in the prompt, including images or documents.input_tokens_details.cached_tokens: The portion ofinput_tokensthat the upstream provider successfully cached from previous requests. These are billed at a discounted rate.output_tokens: The total number of tokens generated by the model, including hidden “thinking” tokens.output_tokens_details.reasoning_tokens: The portion ofoutput_tokensthat the model spent “thinking” before generating the visible answer. These are billed at the standard output rate.total_tokens: The sum of all input and output tokens (input_tokens+output_tokens).
Streaming Usage
When you usestream: true, usage is not delivered token by token during the stream. It arrives with the final completed response snapshot.
You do not need a separate chat-style stream_options flag on this surface.
Practical Advice
- When using models with high context windows, actively monitor
input_tokens_details.cached_tokens. A high ratio of cached tokens means your prompt design is highly cost-efficient. - If you notice latency spikes or unexpected costs when using models that support reasoning effort, check
output_tokens_details.reasoning_tokensto see how much budget the model spent planning its answer. - Always log the
usageobject in your application database alongside the request metadata. It is the most reliable way to attribute costs to specific features or users before running aggregate reports against/v1/account/activity.
Common mistakes
- only logging final text and ignoring usage metadata
- assuming streamed text deltas contain usage as they arrive
- treating reasoning tokens and visible output tokens as the same optimization problem