Limits
How rate limits, per-key spend limits, and balance checks gate requests.
OpenToken enforces two independent gates before a request reaches a provider: a per-key spend limit and a credit-balance pre-check. Each gate maps to a distinct error so you can react to it programmatically. OpenToken does not run its own request-rate limiter today.
Provider rate limits
OpenToken does not rate-limit your requests itself. If the upstream provider throttles a request, that response surfaces through the gateway as type upstream_error with code upstream_error, preserving the provider's HTTP status (typically 429). Back off and retry the request after a short delay.
{
"error": {
"message": "...",
"type": "upstream_error",
"code": "upstream_error"
}
}Per-key spend limits
A key can carry a spend_limit_usd together with a limit_reset period. The gateway tracks USD spend against the key over that window. Once the accumulated spend crosses the limit, further requests on that key return 429 with code spend_limit_exceeded until the window resets.
{
"error": {
"message": "api key spend limit exceeded",
"type": "invalid_request_error",
"code": "spend_limit_exceeded"
}
}spend_limit_exceeded is scoped to a single key. Other keys in the same workspace keep working until they hit their own limits or the shared credit balance runs out.
Balance pre-check
Before dispatching a request, the gateway does a balance pre-check against your workspace credit. When there is no credit left, the request is rejected with 402 and code insufficient_credit — the provider is never called, so you are not billed for the attempt. This pre-check applies to all billed endpoints, including chat completions and embeddings. The per-key spend limit above is currently enforced on chat completions.
{
"error": {
"message": "insufficient credit balance",
"type": "invalid_request_error",
"code": "insufficient_credit"
}
}Handling limits in code
Inspect error.code to decide your response: retry after a delay for an upstream upstream_error, wait for the window or raise the limit for spend_limit_exceeded, and top up credit for insufficient_credit.
import os, time
from openai import OpenAI, RateLimitError, APIStatusError
client = OpenAI(base_url="https://api.opentoken.kr/v1", api_key=os.environ["OPENTOKEN_API_KEY"])
try:
resp = client.chat.completions.create(
model="google/gemini-3-flash",
messages=[{"role": "user", "content": "Summarize today's standup."}],
)
print(resp.choices[0].message.content)
except RateLimitError as e:
if e.code == "spend_limit_exceeded":
raise # wait for the window or raise the limit
time.sleep(2) # transient upstream rate limit, retry
except APIStatusError as e:
if e.status_code == 402 and e.code == "insufficient_credit":
raise # top up credit
raiseimport OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.opentoken.kr/v1",
apiKey: process.env.OPENTOKEN_API_KEY,
});
try {
const resp = await client.chat.completions.create({
model: "google/gemini-3-flash",
messages: [{ role: "user", content: "Summarize today's standup." }],
});
console.log(resp.choices[0].message.content);
} catch (e: any) {
if (e.code === "insufficient_credit") throw e; // top up credit
if (e.code === "spend_limit_exceeded") throw e; // wait for the window or raise the limit
// otherwise back off and retry
}curl https://api.opentoken.kr/v1/chat/completions \
-H "Authorization: Bearer $OPENTOKEN_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "google/gemini-3-flash",
"messages": [{"role": "user", "content": "Summarize today'\''s standup."}]
}'Every successful request writes an immutable usage record and a USD credit-ledger debit. See Usage and billing to read your spend and confirm what counts against each limit.