Skip to main content
Set stream: true to receive chat.completion.chunk payloads over SSE. Use this when your client already expects chat chunks and incremental deltas rather than Responses-style semantic events.

Request

Use stream_options.include_usage when you want the final usage trailer.
from openai import OpenAI

client = OpenAI(
    base_url="https://api.naga.ac/v1",
    api_key="YOUR_API_KEY",
)

stream = client.chat.completions.create(
    model="gpt-5",
    messages=[
        {"role": "user", "content": "Explain retries in one paragraph."}
    ],
    stream=True,
    stream_options={"include_usage": True},
)

for chunk in stream:
    if chunk.choices and chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Chunk Shape

The first chunk usually establishes the assistant role:
{
  "id": "resp_1",
  "object": "chat.completion.chunk",
  "created": 1,
  "model": "gpt-5",
  "choices": [
    {
      "index": 0,
      "delta": {
        "role": "assistant"
      },
      "finish_reason": null
    }
  ]
}
Text then streams in choices[0].delta.content.

What to listen for

  • choices[0].delta.role for the initial assistant role
  • choices[0].delta.content for text deltas
  • choices[0].delta.tool_calls for tool-call deltas
  • finish_reason on the terminal chunk

Tool Call Deltas

Tool calls stream through choices[0].delta.tool_calls.
{
  "choices": [
    {
      "index": 0,
      "delta": {
        "tool_calls": [
          {
            "id": "call_1",
            "type": "function",
            "function": {
              "name": "lookup_weather",
              "arguments": "{\"city\":\"Prague\"}"
            }
          }
        ]
      },
      "finish_reason": null
    }
  ]
}

Final Chunks

When generation finishes, the stream ends with a chunk whose finish_reason is set. If stream_options.include_usage is true, the stream can then include a usage trailer with empty choices:
{
  "choices": [],
  "usage": {
    "prompt_tokens": 5,
    "completion_tokens": 7,
    "total_tokens": 12,
    "completion_tokens_details": {
      "image_tokens": 0
    }
  }
}

Error Behavior

If a failure happens after headers are sent, the stream can end with a payload that contains a top-level error object.

Common mistakes

  • assuming all chunks contain text
  • forgetting to enable stream_options.include_usage when you need final usage data
  • treating chat chunks like Responses semantic events