Skip to main content
Use streaming when you want tokens, tool arguments, or structured events to arrive before the full response is finished. NagaAI supports streaming across the main generation APIs, but each surface uses a different event format.

Support Matrix

APIEnable it withMain text deltaMain tool deltaTerminal signal
Responsesstream: trueresponse.output_text.deltaresponse.function_call_arguments.deltaresponse.completed and [DONE]
Chat Completionsstream: truechoices[0].delta.contentchoices[0].delta.tool_callsfinal chunk with finish_reason
Messagesstream: truecontent_block_delta with text_deltacontent_block_delta with input_json_deltamessage_delta and message_stop

When To Use It

  • lower perceived latency in chat and assistant UIs
  • show long answers as they are generated
  • react to tool-call arguments before the final answer finishes
If your app only needs one final answer and does not care about progressive output, a normal non-streaming request is often simpler.
from openai import OpenAI

client = OpenAI(
    base_url="https://api.naga.ac/v1",
    api_key="YOUR_API_KEY",
)

stream = client.responses.create(
    model="gpt-4.1-mini",
    input="Stream a short explanation of backpressure.",
    stream=True,
)

for event in stream:
    if event.type == "response.output_text.delta":
        print(event.delta, end="")
Typical event flow:
event: response.created
event: response.in_progress
event: response.output_text.delta
event: response.completed
data: [DONE]

Protocol Differences

  • Responses streams named lifecycle events and typed deltas.
  • Chat Completions streams OpenAI-compatible chat.completion.chunk payloads.
  • Messages streams Anthropic-style event names such as content_block_delta and message_stop.

Protocol Examples

Responses streams semantic events, so you usually branch on event.type.
event: response.created
event: response.in_progress
event: response.output_text.delta
event: response.function_call_arguments.delta
event: response.completed
data: [DONE]

Client Checklist

  • parse the stream structurally instead of treating it as plain text
  • stop normal stream processing if an error frame appears
  • expect tool and reasoning deltas to be interleaved with text on some models
  • if you need usage data, check the API-specific streaming docs for how it is delivered
For streaming failures and normal error payloads, read Error Handling.

API-Specific Guides

Reference