OpenToken Docs

Streaming

Stream chat completions over Server-Sent Events and read deltas as they arrive.

Set stream: true on a chat completion to receive the response incrementally over Server-Sent Events (SSE) instead of a single JSON body. Each event carries a small delta, so you can render tokens as the model produces them. The stream ends with a data: [DONE] sentinel.

Stream with the OpenAI SDK

The OpenAI SDKs handle SSE parsing for you — iterate the returned stream and read choices[0].delta.content.

from openai import OpenAI

client = OpenAI(
    base_url="https://api.opentoken.kr/v1",
    api_key="${OPENTOKEN_API_KEY}",
)

stream = client.chat.completions.create(
    model="google/gemini-3-flash",
    messages=[{"role": "user", "content": "Write a haiku about latency."}],
    stream=True,
)

for chunk in stream:
    delta = chunk.choices[0].delta
    if delta.content:
        print(delta.content, end="", flush=True)
import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.opentoken.kr/v1",
  apiKey: process.env.OPENTOKEN_API_KEY,
});

const stream = await client.chat.completions.create({
  model: "google/gemini-3-flash",
  messages: [{ role: "user", content: "Write a haiku about latency." }],
  stream: true,
});

for await (const chunk of stream) {
  const delta = chunk.choices[0]?.delta?.content;
  if (delta) process.stdout.write(delta);
}
curl https://api.opentoken.kr/v1/chat/completions \
  -H "Authorization: Bearer $OPENTOKEN_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "google/gemini-3-flash",
    "messages": [{"role": "user", "content": "Write a haiku about latency."}],
    "stream": true
  }'

Raw SSE frames

If you parse the stream yourself, each frame is a data: line followed by a blank line. The first chunk's delta carries the role, and subsequent chunks carry content. The stream terminates with data: [DONE].

data: {"id":"chatcmpl-1","object":"chat.completion.chunk","created":1717000000,"model":"google/gemini-3-flash","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

data: {"id":"chatcmpl-1","object":"chat.completion.chunk","created":1717000000,"model":"google/gemini-3-flash","choices":[{"index":0,"delta":{"content":"Packets "},"finish_reason":null}]}

data: {"id":"chatcmpl-1","object":"chat.completion.chunk","created":1717000000,"model":"google/gemini-3-flash","choices":[{"index":0,"delta":{"content":"in flight"},"finish_reason":null}]}

data: {"id":"chatcmpl-1","object":"chat.completion.chunk","created":1717000000,"model":"google/gemini-3-flash","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

Usage in the stream

A streamed response always ends with a final usage chunk — you don't need to opt in. Right before the data: [DONE] sentinel, the gateway emits one chunk with an empty choices: [] array and a populated usage object. There is no stream_options field; if you send stream_options: { include_usage: true } it is accepted but ignored, since the usage chunk is sent unconditionally.

{
  "id": "chatcmpl-1",
  "object": "chat.completion.chunk",
  "created": 1717000000,
  "model": "google/gemini-3-flash",
  "choices": [],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 18,
    "total_tokens": 30
  }
}

The same token details available on a non-streamed call appear here too — prompt_tokens_details.cached_tokens for cache hits and completion_tokens_details.reasoning_tokens for thinking tokens. See /docs/api-reference/chat-completions for the full usage shape.

Because the HTTP status is sent before the body, an error can arrive after a 200 OK. Over SSE a mid-stream failure is delivered as a regular data: frame carrying the standard envelope { "error": { "message", "type", "code" } } — not as an event: error SSE event — followed by the usual data: [DONE] sentinel. A generic interrupt uses type upstream_error with code stream_error. Always handle errors inside your stream loop, not only around the initial request.

Next steps