Chat Completions API reports usage in the familiar OpenAI chat format.
Use this page if your app logs prompt_tokens, completion_tokens, or chat-style streaming usage today.
Non-Streaming Usage
For non-streaming requests, every successful completion includes ausage object at the root level of the response.
Field breakdown
prompt_tokens: The number of tokens in the prompt (input). This correlates directly with input token pricing.completion_tokens: The number of tokens in the generated completion (output). This correlates directly with output token pricing.total_tokens: The sum ofprompt_tokensandcompletion_tokens.
If the model supports prompt caching or reasoning efforts, their respective metrics may also appear in nested fields depending on the exact OpenAI version you are targeting, such as
prompt_tokens_details.cached_tokens.Streaming Usage
By default, the OpenAI chat protocol does not include ausage block when streaming.
If your application needs to calculate costs or track token consumption while using stream: true, you must explicitly request the usage data by setting stream_options.include_usage to true.
How to request streaming usage
[DONE]) that contains a null choices array and the populated usage object.
Example Streaming Usage Chunk
Practical Advice
- When using streaming, remember that the usage chunk has an empty
choicesarray. Your client logic should safely handlechoicesbeing empty or null on the final chunk before attempting to accesschunk.usage. - Always log the
usageobject in your application database alongside the request metadata. It is the most reliable way to attribute costs to specific features or users before running aggregate reports against/v1/account/activity.
Common mistakes
- forgetting to enable
stream_options.include_usagewhen streaming - assuming the final usage chunk contains normal message deltas
- only tracking total tokens and ignoring prompt vs completion growth separately