Reasoning
Control model thinking with the reasoning field and the reasoning_effort alias.
OpenToken exposes a provider-agnostic reasoning field on POST /v1/chat/completions. It lets a model spend extra tokens thinking before it answers, and OpenToken maps it to each model's native thinking controls (Gemini thinkingConfig, Anthropic extended thinking). Reasoning is off unless you request it; when enabled with no size, the effort defaults to medium.
Parameters
reasoning is an object on the request body. reasoning_effort is a top-level string alias for reasoning.effort.
Prop
Type
The legacy include_reasoning flag is deprecated. Use reasoning.exclude instead.
Example
curl https://api.opentoken.kr/v1/chat/completions \
-H "Authorization: Bearer $OPENTOKEN_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "google/gemini-3-flash",
"messages": [{ "role": "user", "content": "Why is the sky blue?" }],
"reasoning": { "effort": "low" }
}'from openai import OpenAI
client = OpenAI(
base_url="https://api.opentoken.kr/v1",
api_key="OPENTOKEN_API_KEY", # sk-optk-...
)
resp = client.chat.completions.create(
model="google/gemini-3-flash",
messages=[{"role": "user", "content": "Why is the sky blue?"}],
extra_body={"reasoning": {"effort": "low"}},
)
print(resp.choices[0].message.content)Provider behavior
Each adapter translates reasoning to its upstream control:
- Gemini 3 (
google/gemini-3-flash,google/gemini-3.1-pro) mapseffortto the nativethinkingLevel. On Gemini 3 models,effort: "xhigh"is treated ashigh, andeffort: "minimal"is only honored on thegemini-3-flashfamily — on models likegemini-3.1-prothat lack a minimal level it is clamped up tolow. - Gemini 2.5 (
google/gemini-2.5-pro) maps to a clampedthinkingBudgetbetween 128 and 32768 tokens. Gemini 2.5 cannot disable thinking via a budget of 0; OpenToken handleseffort: "none"by omitting the thinking config entirely, so the model falls back to its own default thinking behavior. To minimize (not eliminate) thinking on Gemini 2.5, pass a low effort or a smallreasoning.max_tokens, which is clamped to the 128-32768 budget range. - Claude (
anthropic/claude-*) mapseffort/max_tokensto an absolute thinkingbudget_tokens, clamped to at least 1024 and strictly belowmax_tokens;effort: "none"leaves extended thinking off. See Models for the registered Claude model ids.
A small max_tokens can be entirely consumed by thinking. When that happens the response returns empty content with finish_reason: "length". Give the model enough headroom for both thinking and the visible answer.
Reading usage
Thinking tokens are reported separately under completion_tokens_details.reasoning_tokens and are included in completion_tokens.
{
"usage": {
"prompt_tokens": 14,
"completion_tokens": 256,
"total_tokens": 270,
"completion_tokens_details": {
"reasoning_tokens": 192
}
}
}