Model routing

How OpenToken dispatches a request to a provider by its model id, and how pre-commit failover works.

OpenToken routes every request by its model id. The id uses the {provider}/{model} form, so the prefix selects which adapter handles the call and the suffix names the upstream model. There is no separate routing config — the model field is the route.

How dispatch works

When a request arrives, OpenToken parses the model id, looks up the registered adapter for the provider, and forwards the normalized request to that provider. Unknown ids never reach an upstream.

from openai import OpenAI

client = OpenAI(
    base_url="https://api.opentoken.kr/v1",
    api_key="OPENTOKEN_API_KEY",
)

resp = client.chat.completions.create(
    model="google/gemini-3-flash",
    messages=[{"role": "user", "content": "Route this to Gemini 3 Flash."}],
)
print(resp.choices[0].message.content)

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.opentoken.kr/v1",
  apiKey: process.env.OPENTOKEN_API_KEY,
});

const resp = await client.chat.completions.create({
  model: "google/gemini-2.5-pro",
  messages: [{ role: "user", content: "Route this to Gemini 2.5 Pro." }],
});
console.log(resp.choices[0].message.content);

curl https://api.opentoken.kr/v1/chat/completions \
  -H "Authorization: Bearer $OPENTOKEN_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "google/gemini-3.1-pro",
    "messages": [{"role": "user", "content": "Route this to Gemini 3.1 Pro."}]
  }'

To target a different model, change the id. OpenToken currently routes to Google Gemini and Anthropic Claude models — for example google/gemini-2.5-pro, google/gemini-3-flash, google/gemini-3.1-pro, google/gemini-3.1-flash-lite, anthropic/claude-opus-4-7, anthropic/claude-sonnet-4-5, and anthropic/claude-haiku-4-5, plus the embeddings model google/text-embedding-004. Call GET /v1/models for the authoritative live list, or see the models page for details.

An unregistered id such as openai/gpt-4o does not route anywhere — it returns a model_not_found error with status 400. Only ids in the live routing catalog (see GET /v1/models) resolve to an adapter.

Failover

OpenToken can retry a request before any response bytes are committed to your connection. If an upstream returns a retryable error, the gateway transparently fails over and you still receive a single clean response.

Pre-commit retry. Failover happens only before the first byte of the response is sent. For a streaming request this means before the first SSE chunk is emitted.
Retryable conditions. Genuine transient upstream failures — an upstream_error with status >= 500 or 429, a 504 upstream_timeout, or a missing_provider_key for the selected supplier — are eligible for failover.
Terminal errors pass through. Errors that a retry cannot fix are returned as-is: 400 model_not_found for an unknown id, 400 invalid_request_error for a validation failure (including 402 with code insufficient_credit for an exhausted balance), 401 authentication_error, and 503 upstream_error with code no_supplier when supplier selection is exhausted. These are never retried.

Because failover is pre-commit, a streaming response that has already started cannot be silently re-routed. After a 200, an upstream failure surfaces over SSE as one more data: frame carrying the error envelope (followed by data: [DONE]) rather than a new attempt. See error handling for the full envelope and code list.

How dispatch works

Failover

Next steps

Models

Error handling

Create chat completion

On this page