Skip to main content
Set stream: true to receive semantic Server-Sent Events instead of waiting for one final JSON response. Use this when you want to render text progressively, react to tool calls early, or inspect structured lifecycle events.

Enable Streaming

from openai import OpenAI

client = OpenAI(
    base_url="https://api.naga.ac/v1",
    api_key="YOUR_API_KEY",
)

stream = client.responses.create(
    model="gpt-4.1-mini",
    input="Stream a short explanation of backpressure.",
    stream=True,
)

for event in stream:
    if event.type == "response.output_text.delta":
        print(event.delta, end="")

Event Lifecycle

Per the protocol tests, a normal text stream looks like this:
event: response.created
data: {"type":"response.created", ...}

event: response.in_progress
data: {"type":"response.in_progress", ...}

event: response.output_text.delta
data: {"type":"response.output_text.delta","delta":"Hel"}

event: response.output_text.delta
data: {"type":"response.output_text.delta","delta":"lo"}

event: response.completed
data: {"type":"response.completed","response":{...}}

data: [DONE]
The final text is reconstructed by concatenating the delta values from response.output_text.delta events.

What to listen for

  • response.output_text.delta for visible text
  • response.function_call_arguments.delta for tool arguments
  • response.completed for the final snapshot
  • [DONE] for the stream terminator

Manual SSE Parsing Example

const response = await fetch('https://api.naga.ac/v1/responses', {
  method: 'POST',
  headers: {
    Authorization: 'Bearer YOUR_API_KEY',
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    model: 'gpt-4.1-mini',
    input: 'Say hello in two words.',
    stream: true,
  }),
});

const decoder = new TextDecoder();
let buffer = '';
let text = '';

for await (const chunk of response.body) {
  buffer += decoder.decode(chunk, { stream: true });

  let splitIndex;
  while ((splitIndex = buffer.indexOf('\n\n')) >= 0) {
    const frame = buffer.slice(0, splitIndex);
    buffer = buffer.slice(splitIndex + 2);

    const dataLine = frame.split('\n').find((line) => line.startsWith('data: '));
    if (!dataLine) continue;

    const data = dataLine.slice(6);
    if (data === '[DONE]') continue;

    const payload = JSON.parse(data);
    if (payload.type === 'response.output_text.delta') {
      text += payload.delta;
    }
  }
}

console.log(text);

Tool Call Streaming

When the model emits a tool call, the stream includes argument deltas such as response.function_call_arguments.delta followed by response.function_call_arguments.done.

Error Events

If an error occurs after headers are already sent, the stream may emit an error event and then a failed terminal event before [DONE].

Common mistakes

  • treating the stream as plain text instead of parsing structured events
  • ignoring tool or reasoning deltas because the client only listens for text
  • forgetting that usage and the final response snapshot arrive at the end, not in each text delta