Skip to main content
Tokens are the units models use to process input and produce output. Understanding usage helps you estimate cost, control context size, and interpret API responses correctly.

What is a token?

A token is a piece of data. Depending on the input, it can represent:
  • text, usually a word or part of a word
  • image content converted into visual tokens
  • audio content converted into audio tokens
As a rough rule of thumb, 1 token ≈ 4 characters in English.

Token Categories

CategoryWhat it means
Input tokensTokens you send in prompts, messages, files, images, or audio
Output tokensTokens the model generates in its response
Cached input tokensReused input tokens on providers that support prompt caching
Reasoning tokensExtra model-internal reasoning work on supported reasoning models
Input tokens are usually cheaper than output tokens. Cached input tokens, when supported, are often discounted relative to normal input tokens.

How Usage is Reported

Because NagaAI supports multiple API surfaces, the exact JSON shape varies by API. Every major API returns a usage object in its response, which you can log to track costs or analyze workloads.

Why usage shapes differ

  • Responses focuses on typed output items and can include richer usage details
  • Chat Completions uses OpenAI-style fields such as prompt_tokens and completion_tokens
  • Messages uses Anthropic-style fields such as input_tokens and output_tokens

Practical advice

  • log usage for both successful requests and streamed requests when available
  • watch for large input growth from long prompts, tools, or conversation history
  • treat cached and reasoning usage as separate cost drivers when your models expose them

API-Specific Guides

Learn how to read the usage object and handle streaming usage for your specific API:

Responses Usage

Usage tracking, cached tokens, and reasoning tokens in the primary Responses API.

Chat Completions Usage

prompt_tokens, completion_tokens, and include_usage in the OpenAI-compatible layer.

Messages Usage

input_tokens and output_tokens in the Anthropic-compatible layer.

Embeddings API

Input token tracking for vector generation.