Chat completions
Send messages and read replies with the OpenAI-compatible chat endpoint.
OpenToken speaks the OpenAI Chat Completions shape, so any OpenAI SDK works once you point it at https://api.opentoken.kr/v1. You send a list of messages and receive an assistant reply. This guide covers the conversational basics; for every field see the chat completions reference.
The messages array
A request carries a messages array. Each message has a role and content, and the model reads them in order to produce the next assistant turn.
| Role | Use it for |
|---|---|
system | Instructions that set behavior, tone, or constraints. Usually the first message. |
user | Input from the person or application talking to the model. |
assistant | The model's previous replies, replayed to give it memory of the conversation. |
tool | Results returned to the model after it called a tool. |
A simple call
Use a registered model id in {provider}/{model} form. The examples below use google/gemini-3-flash.
from openai import OpenAI
import os
client = OpenAI(
base_url="https://api.opentoken.kr/v1",
api_key=os.environ["OPENTOKEN_API_KEY"],
)
resp = client.chat.completions.create(
model="google/gemini-3-flash",
messages=[
{"role": "system", "content": "You are a concise assistant."},
{"role": "user", "content": "Name three uses for a paperclip."},
],
)
print(resp.choices[0].message.content)curl https://api.opentoken.kr/v1/chat/completions \
-H "Authorization: Bearer $OPENTOKEN_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "google/gemini-3-flash",
"messages": [
{"role": "system", "content": "You are a concise assistant."},
{"role": "user", "content": "Name three uses for a paperclip."}
]
}'A non-streaming request returns a chat.completion object. The reply text lives at choices[0].message.content, and token counts are in usage.
{
"id": "chatcmpl-...",
"object": "chat.completion",
"created": 1704067200,
"model": "google/gemini-3-flash",
"choices": [
{
"index": 0,
"message": { "role": "assistant", "content": "1. Hold papers together..." },
"finish_reason": "stop"
}
],
"usage": { "prompt_tokens": 24, "completion_tokens": 41, "total_tokens": 65 }
}Multi-turn conversations
The endpoint is stateless: it has no memory between requests. To continue a conversation, append the model's previous reply and the next user message, then resend the whole array.
messages = [
{"role": "system", "content": "You are a concise assistant."},
{"role": "user", "content": "Name three uses for a paperclip."},
]
resp = client.chat.completions.create(model="google/gemini-3-flash", messages=messages)
messages.append(resp.choices[0].message) # keep the assistant turn
messages.append({"role": "user", "content": "Which is the most common?"})
resp = client.chat.completions.create(model="google/gemini-3-flash", messages=messages)
print(resp.choices[0].message.content)Because state lives in your messages array, you control the context window. Trim or summarize old turns yourself to manage cost and stay within model limits.
Choosing a model
OpenToken routes to Google Gemini and Anthropic Claude models — for example google/gemini-3-flash, google/gemini-2.5-pro, and anthropic/claude-sonnet-4-5. List the live catalog at runtime with GET /v1/models, or browse models. An unregistered id returns a 400 model_not_found error.