Streaming

Set stream: true on a chat completion to receive the response incrementally over Server-Sent Events (SSE) instead of a single JSON body. Each event carries a small delta, so you can render tokens as the model produces them. The stream ends with a data: [DONE] sentinel.

Stream with the OpenAI SDK

The OpenAI SDKs handle SSE parsing for you — iterate the returned stream and read choices[0].delta.content.

from openai import OpenAI

client = OpenAI(
    base_url="https://api.opentoken.kr/v1",
    api_key="${OPENTOKEN_API_KEY}",
)

stream = client.chat.completions.create(
    model="google/gemini-3-flash",
    messages=[{"role": "user", "content": "Write a haiku about latency."}],
    stream=True,
)

for chunk in stream:
    delta = chunk.choices[0].delta
    if delta.content:
        print(delta.content, end="", flush=True)

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.opentoken.kr/v1",
  apiKey: process.env.OPENTOKEN_API_KEY,
});

const stream = await client.chat.completions.create({
  model: "google/gemini-3-flash",
  messages: [{ role: "user", content: "Write a haiku about latency." }],
  stream: true,
});

for await (const chunk of stream) {
  const delta = chunk.choices[0]?.delta?.content;
  if (delta) process.stdout.write(delta);
}

curl https://api.opentoken.kr/v1/chat/completions \
  -H "Authorization: Bearer $OPENTOKEN_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "google/gemini-3-flash",
    "messages": [{"role": "user", "content": "Write a haiku about latency."}],
    "stream": true
  }'

Raw SSE frames

If you parse the stream yourself, each frame is a data: line followed by a blank line. The first chunk's delta carries the role, and subsequent chunks carry content. The stream terminates with data: [DONE].

data: {"id":"chatcmpl-1","object":"chat.completion.chunk","created":1717000000,"model":"google/gemini-3-flash","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

data: {"id":"chatcmpl-1","object":"chat.completion.chunk","created":1717000000,"model":"google/gemini-3-flash","choices":[{"index":0,"delta":{"content":"Packets "},"finish_reason":null}]}

data: {"id":"chatcmpl-1","object":"chat.completion.chunk","created":1717000000,"model":"google/gemini-3-flash","choices":[{"index":0,"delta":{"content":"in flight"},"finish_reason":null}]}

data: {"id":"chatcmpl-1","object":"chat.completion.chunk","created":1717000000,"model":"google/gemini-3-flash","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

Usage in the stream

A streamed response always ends with a final usage chunk — you don't need to opt in. Right before the data: [DONE] sentinel, the gateway emits one chunk with an empty choices: [] array and a populated usage object. There is no stream_options field; if you send stream_options: { include_usage: true } it is accepted but ignored, since the usage chunk is sent unconditionally.

{
  "id": "chatcmpl-1",
  "object": "chat.completion.chunk",
  "created": 1717000000,
  "model": "google/gemini-3-flash",
  "choices": [],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 18,
    "total_tokens": 30
  }
}

The same token details available on a non-streamed call appear here too — prompt_tokens_details.cached_tokens for cache hits and completion_tokens_details.reasoning_tokens for thinking tokens. See /docs/api-reference/chat-completions for the full usage shape.

Because the HTTP status is sent before the body, an error can arrive after a 200 OK. Over SSE a mid-stream failure is delivered as a regular data: frame carrying the standard envelope { "error": { "message", "type", "code" } } — not as an event: error SSE event — followed by the usual data: [DONE] sentinel. A generic interrupt uses type upstream_error with code stream_error. Always handle errors inside your stream loop, not only around the initial request.

Stream with the OpenAI SDK

Raw SSE frames

Usage in the stream

Next steps

Create chat completion

Reasoning

Models

On this page