Streaming
SSE wire format for streamed chat completions — frames, deltas, usage, and the [DONE] sentinel.
When a chat completion is created with stream: true, the response body is a stream of Server-Sent Events instead of a single chat.completion JSON object. This page documents the exact wire format.
Frame format
Each event is a single line of the form data: {json} followed by a blank line, so the byte sequence on the wire is data: {json}\n\n. Every JSON payload is a chat.completion.chunk object. The stream terminates with the literal sentinel data: [DONE].
The first chunk's delta carries role: "assistant". Subsequent chunks carry incremental content. The last content chunk carries a non-null finish_reason (for example stop, or length when the output limit is reached) and an empty delta.
data: {"id":"chatcmpl-1","object":"chat.completion.chunk","created":1700000000,"model":"google/gemini-3-flash","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}
data: {"id":"chatcmpl-1","object":"chat.completion.chunk","created":1700000000,"model":"google/gemini-3-flash","choices":[{"index":0,"delta":{"content":"Packets "},"finish_reason":null}]}
data: {"id":"chatcmpl-1","object":"chat.completion.chunk","created":1700000000,"model":"google/gemini-3-flash","choices":[{"index":0,"delta":{"content":"in flight"},"finish_reason":null}]}
data: {"id":"chatcmpl-1","object":"chat.completion.chunk","created":1700000000,"model":"google/gemini-3-flash","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}
data: [DONE]Chunk fields
Prop
Type
Usage chunk
Every streamed response ends with one final chunk that has an empty choices: [] array and a populated usage object, immediately before data: [DONE]. Unlike vanilla OpenAI, OpenToken always returns streaming usage — there is no stream_options.include_usage opt-in (any stream_options field a client sends is ignored).
{
"id": "chatcmpl-1",
"object": "chat.completion.chunk",
"created": 1700000000,
"model": "google/gemini-3-flash",
"choices": [],
"usage": {
"prompt_tokens": 12,
"completion_tokens": 18,
"total_tokens": 30
}
}The same token details available on a buffered call appear here: prompt_tokens_details.cached_tokens for cache hits and completion_tokens_details.reasoning_tokens for thinking tokens.
Because the HTTP status is sent before the body, an error can arrive after a 200 OK. A mid-stream failure is delivered as one more data: frame whose JSON payload is the error envelope { "error": { "message", "type", "code" } } (instead of a chat.completion.chunk), followed by the usual data: [DONE]. It is not an SSE event: error frame, so detect it by checking each parsed data: payload for an error key. Handle errors inside the stream loop, not only around the initial request.