Streaming

Use streaming when you want tokens, tool arguments, or structured events to arrive before the full response is finished. NagaAI supports streaming across the main generation APIs, but each surface uses a different event format.

Support Matrix

API	Enable it with	Main text delta	Main tool delta	Terminal signal
`Responses`	`stream: true`	`response.output_text.delta`	`response.function_call_arguments.delta`	`response.completed` and `[DONE]`
`Chat Completions`	`stream: true`	`choices[0].delta.content`	`choices[0].delta.tool_calls`	final chunk with `finish_reason`
`Messages`	`stream: true`	`content_block_delta` with `text_delta`	`content_block_delta` with `input_json_delta`	`message_delta` and `message_stop`

When To Use It

lower perceived latency in chat and assistant UIs
show long answers as they are generated
react to tool-call arguments before the final answer finishes

If your app only needs one final answer and does not care about progressive output, a normal non-streaming request is often simpler.

Recommended Example

from openai import OpenAI

client = OpenAI(
    base_url="https://api.naga.ac/v1",
    api_key="YOUR_API_KEY",
)

stream = client.responses.create(
    model="gpt-4.1-mini",
    input="Stream a short explanation of backpressure.",
    stream=True,
)

for event in stream:
    if event.type == "response.output_text.delta":
        print(event.delta, end="")

Typical event flow:

event: response.created
event: response.in_progress
event: response.output_text.delta
event: response.completed
data: [DONE]

Protocol Differences

Responses streams named lifecycle events and typed deltas.
Chat Completions streams OpenAI-compatible chat.completion.chunk payloads.
Messages streams Anthropic-style event names such as content_block_delta and message_stop.

Protocol Examples

Responses
Chat Completions
Messages

Responses streams semantic events, so you usually branch on event.type.

event: response.created
event: response.in_progress
event: response.output_text.delta
event: response.function_call_arguments.delta
event: response.completed
data: [DONE]

Chat Completions streams chat.completion.chunk payloads, so you read deltas out of choices[0].delta.

{
  "object": "chat.completion.chunk",
  "choices": [
    {
      "delta": {
        "role": "assistant",
        "content": "Hello"
      },
      "finish_reason": null
    }
  ]
}

Terminal chunks set finish_reason, and usage can arrive later if stream_options.include_usage is enabled.

Messages streams Anthropic-style event names rather than chat chunks or Responses semantic events.

event: message_start
event: content_block_start
event: content_block_delta
event: message_delta
event: message_stop

Text arrives through content_block_delta with type: text_delta, while tool and thinking data use other delta types.

Client Checklist

parse the stream structurally instead of treating it as plain text
stop normal stream processing if an error frame appears
expect tool and reasoning deltas to be interleaved with text on some models
if you need usage data, check the API-specific streaming docs for how it is delivered

For streaming failures and normal error payloads, read Error Handling.

Support Matrix

When To Use It

Recommended Example

Protocol Differences

Protocol Examples

Client Checklist

API-Specific Guides

Reference

Documentation Index

​Support Matrix

​When To Use It

​Recommended Example

​Protocol Differences

​Protocol Examples

​Client Checklist

​API-Specific Guides

​Reference

Support Matrix

When To Use It

Recommended Example

Protocol Differences

Protocol Examples

Client Checklist

API-Specific Guides

Reference