Reasoning

OpenToken exposes a provider-agnostic reasoning field on POST /v1/chat/completions. It lets a model spend extra tokens thinking before it answers, and OpenToken maps it to each model's native thinking controls (Gemini thinkingConfig, Anthropic extended thinking). Reasoning is off unless you request it; when enabled with no size, the effort defaults to medium.

Parameters

reasoning is an object on the request body. reasoning_effort is a top-level string alias for reasoning.effort.

Prop

Type

The legacy include_reasoning flag is deprecated. Use reasoning.exclude instead.

Example

curl https://api.opentoken.kr/v1/chat/completions \
  -H "Authorization: Bearer $OPENTOKEN_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "google/gemini-3-flash",
    "messages": [{ "role": "user", "content": "Why is the sky blue?" }],
    "reasoning": { "effort": "low" }
  }'

from openai import OpenAI

client = OpenAI(
    base_url="https://api.opentoken.kr/v1",
    api_key="OPENTOKEN_API_KEY",  # sk-optk-...
)

resp = client.chat.completions.create(
    model="google/gemini-3-flash",
    messages=[{"role": "user", "content": "Why is the sky blue?"}],
    extra_body={"reasoning": {"effort": "low"}},
)
print(resp.choices[0].message.content)

Provider behavior

Each adapter translates reasoning to its upstream control:

Gemini 3 (google/gemini-3-flash, google/gemini-3.1-pro) maps effort to the native thinkingLevel. On Gemini 3 models, effort: "xhigh" is treated as high, and effort: "minimal" is only honored on the gemini-3-flash family — on models like gemini-3.1-pro that lack a minimal level it is clamped up to low.
Gemini 2.5 (google/gemini-2.5-pro) maps to a clamped thinkingBudget between 128 and 32768 tokens. Gemini 2.5 cannot disable thinking via a budget of 0; OpenToken handles effort: "none" by omitting the thinking config entirely, so the model falls back to its own default thinking behavior. To minimize (not eliminate) thinking on Gemini 2.5, pass a low effort or a small reasoning.max_tokens, which is clamped to the 128-32768 budget range.
Claude (anthropic/claude-*) maps effort/max_tokens to an absolute thinking budget_tokens, clamped to at least 1024 and strictly below max_tokens; effort: "none" leaves extended thinking off. See Models for the registered Claude model ids.

A small max_tokens can be entirely consumed by thinking. When that happens the response returns empty content with finish_reason: "length". Give the model enough headroom for both thinking and the visible answer.

Reading usage

Thinking tokens are reported separately under completion_tokens_details.reasoning_tokens and are included in completion_tokens.

{
  "usage": {
    "prompt_tokens": 14,
    "completion_tokens": 256,
    "total_tokens": 270,
    "completion_tokens_details": {
      "reasoning_tokens": 192
    }
  }
}

Parameters

Example

Provider behavior

Reading usage

See also

Create chat completion

Models

On this page