OpenToken Docs

Streaming

SSE wire format for streamed chat completions — frames, deltas, usage, and the [DONE] sentinel.

When a chat completion is created with stream: true, the response body is a stream of Server-Sent Events instead of a single chat.completion JSON object. This page documents the exact wire format.

Frame format

Each event is a single line of the form data: {json} followed by a blank line, so the byte sequence on the wire is data: {json}\n\n. Every JSON payload is a chat.completion.chunk object. The stream terminates with the literal sentinel data: [DONE].

The first chunk's delta carries role: "assistant". Subsequent chunks carry incremental content. The last content chunk carries a non-null finish_reason (for example stop, or length when the output limit is reached) and an empty delta.

data: {"id":"chatcmpl-1","object":"chat.completion.chunk","created":1700000000,"model":"google/gemini-3-flash","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

data: {"id":"chatcmpl-1","object":"chat.completion.chunk","created":1700000000,"model":"google/gemini-3-flash","choices":[{"index":0,"delta":{"content":"Packets "},"finish_reason":null}]}

data: {"id":"chatcmpl-1","object":"chat.completion.chunk","created":1700000000,"model":"google/gemini-3-flash","choices":[{"index":0,"delta":{"content":"in flight"},"finish_reason":null}]}

data: {"id":"chatcmpl-1","object":"chat.completion.chunk","created":1700000000,"model":"google/gemini-3-flash","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

Chunk fields

Prop

Type

Usage chunk

Every streamed response ends with one final chunk that has an empty choices: [] array and a populated usage object, immediately before data: [DONE]. Unlike vanilla OpenAI, OpenToken always returns streaming usage — there is no stream_options.include_usage opt-in (any stream_options field a client sends is ignored).

{
  "id": "chatcmpl-1",
  "object": "chat.completion.chunk",
  "created": 1700000000,
  "model": "google/gemini-3-flash",
  "choices": [],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 18,
    "total_tokens": 30
  }
}

The same token details available on a buffered call appear here: prompt_tokens_details.cached_tokens for cache hits and completion_tokens_details.reasoning_tokens for thinking tokens.

Because the HTTP status is sent before the body, an error can arrive after a 200 OK. A mid-stream failure is delivered as one more data: frame whose JSON payload is the error envelope { "error": { "message", "type", "code" } } (instead of a chat.completion.chunk), followed by the usual data: [DONE]. It is not an SSE event: error frame, so detect it by checking each parsed data: payload for an error key. Handle errors inside the stream loop, not only around the initial request.

See also