Streaming
Stream chat completions over Server-Sent Events and read deltas as they arrive.
Set stream: true on a chat completion to receive the response incrementally over Server-Sent Events (SSE) instead of a single JSON body. Each event carries a small delta, so you can render tokens as the model produces them. The stream ends with a data: [DONE] sentinel.
Stream with the OpenAI SDK
The OpenAI SDKs handle SSE parsing for you — iterate the returned stream and read choices[0].delta.content.
from openai import OpenAI
client = OpenAI(
base_url="https://api.opentoken.kr/v1",
api_key="${OPENTOKEN_API_KEY}",
)
stream = client.chat.completions.create(
model="google/gemini-3-flash",
messages=[{"role": "user", "content": "Write a haiku about latency."}],
stream=True,
)
for chunk in stream:
delta = chunk.choices[0].delta
if delta.content:
print(delta.content, end="", flush=True)import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.opentoken.kr/v1",
apiKey: process.env.OPENTOKEN_API_KEY,
});
const stream = await client.chat.completions.create({
model: "google/gemini-3-flash",
messages: [{ role: "user", content: "Write a haiku about latency." }],
stream: true,
});
for await (const chunk of stream) {
const delta = chunk.choices[0]?.delta?.content;
if (delta) process.stdout.write(delta);
}curl https://api.opentoken.kr/v1/chat/completions \
-H "Authorization: Bearer $OPENTOKEN_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "google/gemini-3-flash",
"messages": [{"role": "user", "content": "Write a haiku about latency."}],
"stream": true
}'Raw SSE frames
If you parse the stream yourself, each frame is a data: line followed by a blank line. The first chunk's delta carries the role, and subsequent chunks carry content. The stream terminates with data: [DONE].
data: {"id":"chatcmpl-1","object":"chat.completion.chunk","created":1717000000,"model":"google/gemini-3-flash","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}
data: {"id":"chatcmpl-1","object":"chat.completion.chunk","created":1717000000,"model":"google/gemini-3-flash","choices":[{"index":0,"delta":{"content":"Packets "},"finish_reason":null}]}
data: {"id":"chatcmpl-1","object":"chat.completion.chunk","created":1717000000,"model":"google/gemini-3-flash","choices":[{"index":0,"delta":{"content":"in flight"},"finish_reason":null}]}
data: {"id":"chatcmpl-1","object":"chat.completion.chunk","created":1717000000,"model":"google/gemini-3-flash","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}
data: [DONE]Usage in the stream
A streamed response always ends with a final usage chunk — you don't need to opt in. Right before the data: [DONE] sentinel, the gateway emits one chunk with an empty choices: [] array and a populated usage object. There is no stream_options field; if you send stream_options: { include_usage: true } it is accepted but ignored, since the usage chunk is sent unconditionally.
{
"id": "chatcmpl-1",
"object": "chat.completion.chunk",
"created": 1717000000,
"model": "google/gemini-3-flash",
"choices": [],
"usage": {
"prompt_tokens": 12,
"completion_tokens": 18,
"total_tokens": 30
}
}The same token details available on a non-streamed call appear here too — prompt_tokens_details.cached_tokens for cache hits and completion_tokens_details.reasoning_tokens for thinking tokens. See /docs/api-reference/chat-completions for the full usage shape.
Because the HTTP status is sent before the body, an error can arrive after a 200 OK. Over SSE a mid-stream failure is delivered as a regular data: frame carrying the standard envelope { "error": { "message", "type", "code" } } — not as an event: error SSE event — followed by the usual data: [DONE] sentinel. A generic interrupt uses type upstream_error with code stream_error. Always handle errors inside your stream loop, not only around the initial request.