Skip to main content
Chat Completions API reports usage in the familiar OpenAI chat format. Use this page if your app logs prompt_tokens, completion_tokens, or chat-style streaming usage today.

Non-Streaming Usage

For non-streaming requests, every successful completion includes a usage object at the root level of the response.
{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "gpt-4.1-mini",
  "choices": [...],
  "usage": {
    "prompt_tokens": 194,
    "completion_tokens": 2,
    "total_tokens": 196
  }
}

Field breakdown

  • prompt_tokens: The number of tokens in the prompt (input). This correlates directly with input token pricing.
  • completion_tokens: The number of tokens in the generated completion (output). This correlates directly with output token pricing.
  • total_tokens: The sum of prompt_tokens and completion_tokens.
If the model supports prompt caching or reasoning efforts, their respective metrics may also appear in nested fields depending on the exact OpenAI version you are targeting, such as prompt_tokens_details.cached_tokens.

Streaming Usage

By default, the OpenAI chat protocol does not include a usage block when streaming. If your application needs to calculate costs or track token consumption while using stream: true, you must explicitly request the usage data by setting stream_options.include_usage to true.

How to request streaming usage

from openai import OpenAI

client = OpenAI(
    base_url="https://api.naga.ac/v1",
    api_key="YOUR_API_KEY",
)

stream = client.chat.completions.create(
    model="gpt-4.1-mini",
    messages=[{"role": "user", "content": "Hello!"}],
    stream=True,
    stream_options={"include_usage": True}
)

for chunk in stream:
    if chunk.usage:
        print(f"Prompt tokens: {chunk.usage.prompt_tokens}")
        print(f"Completion tokens: {chunk.usage.completion_tokens}")
When enabled, the server will emit an additional chunk at the very end of the stream (just before [DONE]) that contains a null choices array and the populated usage object.
Example Streaming Usage Chunk
{
  "id": "chatcmpl-123",
  "object": "chat.completion.chunk",
  "created": 1694268190,
  "model": "gpt-4.1-mini",
  "choices": [],
  "usage": {
    "prompt_tokens": 194,
    "completion_tokens": 2,
    "total_tokens": 196
  }
}

Practical Advice

  • When using streaming, remember that the usage chunk has an empty choices array. Your client logic should safely handle choices being empty or null on the final chunk before attempting to access chunk.usage.
  • Always log the usage object in your application database alongside the request metadata. It is the most reliable way to attribute costs to specific features or users before running aggregate reports against /v1/account/activity.

Common mistakes

  • forgetting to enable stream_options.include_usage when streaming
  • assuming the final usage chunk contains normal message deltas
  • only tracking total tokens and ignoring prompt vs completion growth separately