Streaming

SSE wire format for streamed chat completions — frames, deltas, usage, and the [DONE] sentinel.

When a chat completion is created with stream: true, the response body is a stream of Server-Sent Events instead of a single chat.completion JSON object. This page documents the exact wire format.

Frame format

Each event is a single line of the form data: {json} followed by a blank line, so the byte sequence on the wire is data: {json}\n\n. Every JSON payload is a chat.completion.chunk object. The stream terminates with the literal sentinel data: [DONE].

The first chunk's delta carries role: "assistant". Subsequent chunks carry incremental content. The last content chunk carries a non-null finish_reason (for example stop, or length when the output limit is reached) and an empty delta.

data: {"id":"chatcmpl-1","object":"chat.completion.chunk","created":1700000000,"model":"google/gemini-3-flash","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

data: {"id":"chatcmpl-1","object":"chat.completion.chunk","created":1700000000,"model":"google/gemini-3-flash","choices":[{"index":0,"delta":{"content":"Packets "},"finish_reason":null}]}

data: {"id":"chatcmpl-1","object":"chat.completion.chunk","created":1700000000,"model":"google/gemini-3-flash","choices":[{"index":0,"delta":{"content":"in flight"},"finish_reason":null}]}

data: {"id":"chatcmpl-1","object":"chat.completion.chunk","created":1700000000,"model":"google/gemini-3-flash","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

Chunk fields

Prop

Type

Usage chunk

Every streamed response ends with one final chunk that has an empty choices: [] array and a populated usage object, immediately before data: [DONE]. Unlike vanilla OpenAI, OpenToken always returns streaming usage — there is no stream_options.include_usage opt-in (any stream_options field a client sends is ignored).

{
  "id": "chatcmpl-1",
  "object": "chat.completion.chunk",
  "created": 1700000000,
  "model": "google/gemini-3-flash",
  "choices": [],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 18,
    "total_tokens": 30
  }
}

The same token details available on a buffered call appear here: prompt_tokens_details.cached_tokens for cache hits and completion_tokens_details.reasoning_tokens for thinking tokens.

Because the HTTP status is sent before the body, an error can arrive after a 200 OK. A mid-stream failure is delivered as one more data: frame whose JSON payload is the error envelope { "error": { "message", "type", "code" } } (instead of a chat.completion.chunk), followed by the usual data: [DONE]. It is not an SSE event: error frame, so detect it by checking each parsed data: payload for an error key. Handle errors inside the stream loop, not only around the initial request.

Frame format

Chunk fields

Usage chunk

See also

Streaming guide

Create chat completion

Errors

On this page