Skip to main content
Messages API reports usage in the Anthropic-style format. Use this page if your client expects input_tokens, output_tokens, and Messages streaming events.

Non-Streaming Usage

When making a synchronous request to the Messages API, the top-level response object includes a usage block containing the token metrics for the generation.
{
  "id": "msg_01...",
  "type": "message",
  "role": "assistant",
  "content": [...],
  "model": "claude-sonnet-4.5",
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 12,
    "output_tokens": 3
  }
}

Field breakdown

  • input_tokens: The number of tokens in the prompt (input), including any images or documents. This correlates directly with input token pricing.
  • output_tokens: The number of tokens generated by the model (output). This correlates directly with output token pricing.

Streaming Usage

When using stream: true in the Anthropic protocol, usage is not delivered in a single top-level object at the end. Instead, usage information is incrementally delivered through specific SSE events during the stream lifecycle. You do not need to request streaming usage explicitly; the protocol includes it by default.

message_start event

The very first event in the stream (message_start) includes the usage details for the input prompt.
Example message_start Event
event: message_start
data: {
  "type": "message_start",
  "message": {
    "id": "msg_01...",
    "type": "message",
    "role": "assistant",
    "model": "claude-sonnet-4.5",
    "usage": {
      "input_tokens": 12,
      "output_tokens": 0
    }
  }
}
At this stage, output_tokens is 0 because generation has just begun. The input_tokens value represents your prompt cost.

message_delta event

Near the end of the stream, immediately before the message_stop event, the server emits a message_delta event. This event contains the final usage metrics for the output tokens generated during the stream.
Example message_delta Event
event: message_delta
data: {
  "type": "message_delta",
  "delta": {
    "stop_reason": "end_turn",
    "stop_sequence": null
  },
  "usage": {
    "output_tokens": 3
  }
}
The output_tokens value in this delta event represents the final output cost. To determine the total cost of a streaming interaction, your application logic must capture input_tokens from the message_start event and output_tokens from the message_delta event.

Practical Advice

  • If you are using the official @anthropic-ai/sdk, you can usually access the final aggregated usage directly via the finalMessage() helper or by accumulating the events manually.
  • Always log the usage object in your application database alongside the request metadata. It is the most reliable way to attribute costs to specific features or users before running aggregate reports against /v1/account/activity.

Common mistakes

  • expecting one final OpenAI-style usage trailer chunk instead of Anthropic event-based usage
  • only capturing message_start and forgetting the final message_delta
  • treating Messages usage fields as interchangeable with Chat Completions usage fields