Chat completions

OpenToken speaks the OpenAI Chat Completions shape, so any OpenAI SDK works once you point it at https://api.opentoken.kr/v1. You send a list of messages and receive an assistant reply. This guide covers the conversational basics; for every field see the chat completions reference.

The messages array

A request carries a messages array. Each message has a role and content, and the model reads them in order to produce the next assistant turn.

Role	Use it for
`system`	Instructions that set behavior, tone, or constraints. Usually the first message.
`user`	Input from the person or application talking to the model.
`assistant`	The model's previous replies, replayed to give it memory of the conversation.
`tool`	Results returned to the model after it called a tool.

A simple call

Use a registered model id in {provider}/{model} form. The examples below use google/gemini-3-flash.

from openai import OpenAI
import os

client = OpenAI(
    base_url="https://api.opentoken.kr/v1",
    api_key=os.environ["OPENTOKEN_API_KEY"],
)

resp = client.chat.completions.create(
    model="google/gemini-3-flash",
    messages=[
        {"role": "system", "content": "You are a concise assistant."},
        {"role": "user", "content": "Name three uses for a paperclip."},
    ],
)

print(resp.choices[0].message.content)

curl https://api.opentoken.kr/v1/chat/completions \
  -H "Authorization: Bearer $OPENTOKEN_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "google/gemini-3-flash",
    "messages": [
      {"role": "system", "content": "You are a concise assistant."},
      {"role": "user", "content": "Name three uses for a paperclip."}
    ]
  }'

A non-streaming request returns a chat.completion object. The reply text lives at choices[0].message.content, and token counts are in usage.

{
  "id": "chatcmpl-...",
  "object": "chat.completion",
  "created": 1704067200,
  "model": "google/gemini-3-flash",
  "choices": [
    {
      "index": 0,
      "message": { "role": "assistant", "content": "1. Hold papers together..." },
      "finish_reason": "stop"
    }
  ],
  "usage": { "prompt_tokens": 24, "completion_tokens": 41, "total_tokens": 65 }
}

Multi-turn conversations

The endpoint is stateless: it has no memory between requests. To continue a conversation, append the model's previous reply and the next user message, then resend the whole array.

messages = [
    {"role": "system", "content": "You are a concise assistant."},
    {"role": "user", "content": "Name three uses for a paperclip."},
]

resp = client.chat.completions.create(model="google/gemini-3-flash", messages=messages)
messages.append(resp.choices[0].message)  # keep the assistant turn

messages.append({"role": "user", "content": "Which is the most common?"})
resp = client.chat.completions.create(model="google/gemini-3-flash", messages=messages)
print(resp.choices[0].message.content)

Because state lives in your messages array, you control the context window. Trim or summarize old turns yourself to manage cost and stay within model limits.

Choosing a model

OpenToken routes to Google Gemini and Anthropic Claude models — for example google/gemini-3-flash, google/gemini-2.5-pro, and anthropic/claude-sonnet-4-5. List the live catalog at runtime with GET /v1/models, or browse models. An unregistered id returns a 400 model_not_found error.

The messages array

A simple call

Multi-turn conversations

Choosing a model

Next steps

Streaming

Reasoning

Chat completions reference

Models

On this page