Models
Browse the models OpenToken routes to and learn the {provider}/{model} id format.
OpenToken exposes every model through a single OpenAI-compatible gateway. You select a model with the model field on a chat completion, using the {provider}/{model} id format — for example google/gemini-2.5-pro. See model ids for the full naming rules.
List models
GET /v1/models returns the registered models as an OpenAI-compatible list.
curl https://api.opentoken.kr/v1/models \
-H "Authorization: Bearer $OPENTOKEN_API_KEY"from openai import OpenAI
client = OpenAI(
base_url="https://api.opentoken.kr/v1",
api_key=os.environ["OPENTOKEN_API_KEY"],
)
print(client.models.list())import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.opentoken.kr/v1",
apiKey: process.env.OPENTOKEN_API_KEY,
});
console.log(await client.models.list());Registered models
| Model id | Provider | Input | Output | Cache read | Cache write |
|---|---|---|---|---|---|
anthropic/claude-haiku-4-5 | Anthropic | $1.00 | $5.00 | $0.10 | $1.25 |
anthropic/claude-opus-4-6 | Anthropic | $5.00 | $25.00 | $0.50 | $6.25 |
anthropic/claude-opus-4-7 | Anthropic | $5.00 | $25.00 | $0.50 | $6.25 |
anthropic/claude-sonnet-4-5 | Anthropic | $3.00 | $15.00 | $0.30 | $3.75 |
anthropic/claude-sonnet-4-6 | Anthropic | $3.00 | $15.00 | $0.30 | $3.75 |
google/gemini-2.5-pro | $1.25 | $10.00 | $0.125 | $1.25 | |
google/gemini-3-flash | $0.50 | $3.00 | $0.05 | $0.50 | |
google/gemini-3.1-flash-lite | $0.25 | $1.50 | $0.025 | $0.25 | |
google/gemini-3.1-pro | $2.00 | $12.00 | $0.20 | $2.00 |
Prices are USD per 1M tokens. Cache read / write apply when a request carries a cache_control breakpoint; a blank cell means the model has no distinct cache rate. The authoritative live rates are always GET /v1/models.
Embeddings
| Model id | Provider | Input |
|---|---|---|
google/text-embedding-004 | $0.025 |
Embeddings are billed on input tokens only (no completion tokens) via POST /v1/embeddings.
All chat models support streaming, reasoning, and prompt caching. Tool calling is supported on Google models (translated to native function calling) but rejected by Anthropic models, which return a 400 unsupported_parameter. Passing an unregistered id returns a 400 invalid_request_error with code model_not_found — for example openai/gpt-4o, a model OpenToken does not currently route.
This table is generated from the gateway's routing catalog — the same source as GET /v1/models — so it stays in sync with what is actually routable. Every billed request also writes an immutable usage record and a USD credit-ledger debit.
Capabilities
- Streaming — set
stream: trueto receive Server-Sent Events. See streaming. - Tools — pass
toolsandtool_choiceon a chat completion (Google models only; Anthropic models reject tools with400 unsupported_parameter). - Reasoning — control thinking with
reasoning: { effort, max_tokens, exclude }. See reasoning. - Caching — add an OpenRouter-style
cache_controlbreakpoint on the system message to cache a large prefix. See prompt caching. - Embeddings — the embeddings model is served on
POST /v1/embeddings(input-only billing).
Available endpoints: POST /v1/chat/completions, POST /v1/embeddings, and GET /v1/models.