OpenToken Docs

Chat completions

Send messages and read replies with the OpenAI-compatible chat endpoint.

OpenToken speaks the OpenAI Chat Completions shape, so any OpenAI SDK works once you point it at https://api.opentoken.kr/v1. You send a list of messages and receive an assistant reply. This guide covers the conversational basics; for every field see the chat completions reference.

The messages array

A request carries a messages array. Each message has a role and content, and the model reads them in order to produce the next assistant turn.

RoleUse it for
systemInstructions that set behavior, tone, or constraints. Usually the first message.
userInput from the person or application talking to the model.
assistantThe model's previous replies, replayed to give it memory of the conversation.
toolResults returned to the model after it called a tool.

A simple call

Use a registered model id in {provider}/{model} form. The examples below use google/gemini-3-flash.

from openai import OpenAI
import os

client = OpenAI(
    base_url="https://api.opentoken.kr/v1",
    api_key=os.environ["OPENTOKEN_API_KEY"],
)

resp = client.chat.completions.create(
    model="google/gemini-3-flash",
    messages=[
        {"role": "system", "content": "You are a concise assistant."},
        {"role": "user", "content": "Name three uses for a paperclip."},
    ],
)

print(resp.choices[0].message.content)
curl https://api.opentoken.kr/v1/chat/completions \
  -H "Authorization: Bearer $OPENTOKEN_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "google/gemini-3-flash",
    "messages": [
      {"role": "system", "content": "You are a concise assistant."},
      {"role": "user", "content": "Name three uses for a paperclip."}
    ]
  }'

A non-streaming request returns a chat.completion object. The reply text lives at choices[0].message.content, and token counts are in usage.

{
  "id": "chatcmpl-...",
  "object": "chat.completion",
  "created": 1704067200,
  "model": "google/gemini-3-flash",
  "choices": [
    {
      "index": 0,
      "message": { "role": "assistant", "content": "1. Hold papers together..." },
      "finish_reason": "stop"
    }
  ],
  "usage": { "prompt_tokens": 24, "completion_tokens": 41, "total_tokens": 65 }
}

Multi-turn conversations

The endpoint is stateless: it has no memory between requests. To continue a conversation, append the model's previous reply and the next user message, then resend the whole array.

messages = [
    {"role": "system", "content": "You are a concise assistant."},
    {"role": "user", "content": "Name three uses for a paperclip."},
]

resp = client.chat.completions.create(model="google/gemini-3-flash", messages=messages)
messages.append(resp.choices[0].message)  # keep the assistant turn

messages.append({"role": "user", "content": "Which is the most common?"})
resp = client.chat.completions.create(model="google/gemini-3-flash", messages=messages)
print(resp.choices[0].message.content)

Because state lives in your messages array, you control the context window. Trim or summarize old turns yourself to manage cost and stay within model limits.

Choosing a model

OpenToken routes to Google Gemini and Anthropic Claude models — for example google/gemini-3-flash, google/gemini-2.5-pro, and anthropic/claude-sonnet-4-5. List the live catalog at runtime with GET /v1/models, or browse models. An unregistered id returns a 400 model_not_found error.

Next steps