OpenToken Docs

Limits

How rate limits, per-key spend limits, and balance checks gate requests.

OpenToken enforces two independent gates before a request reaches a provider: a per-key spend limit and a credit-balance pre-check. Each gate maps to a distinct error so you can react to it programmatically. OpenToken does not run its own request-rate limiter today.

Provider rate limits

OpenToken does not rate-limit your requests itself. If the upstream provider throttles a request, that response surfaces through the gateway as type upstream_error with code upstream_error, preserving the provider's HTTP status (typically 429). Back off and retry the request after a short delay.

{
  "error": {
    "message": "...",
    "type": "upstream_error",
    "code": "upstream_error"
  }
}

Per-key spend limits

A key can carry a spend_limit_usd together with a limit_reset period. The gateway tracks USD spend against the key over that window. Once the accumulated spend crosses the limit, further requests on that key return 429 with code spend_limit_exceeded until the window resets.

{
  "error": {
    "message": "api key spend limit exceeded",
    "type": "invalid_request_error",
    "code": "spend_limit_exceeded"
  }
}

spend_limit_exceeded is scoped to a single key. Other keys in the same workspace keep working until they hit their own limits or the shared credit balance runs out.

Balance pre-check

Before dispatching a request, the gateway does a balance pre-check against your workspace credit. When there is no credit left, the request is rejected with 402 and code insufficient_credit — the provider is never called, so you are not billed for the attempt. This pre-check applies to all billed endpoints, including chat completions and embeddings. The per-key spend limit above is currently enforced on chat completions.

{
  "error": {
    "message": "insufficient credit balance",
    "type": "invalid_request_error",
    "code": "insufficient_credit"
  }
}

Handling limits in code

Inspect error.code to decide your response: retry after a delay for an upstream upstream_error, wait for the window or raise the limit for spend_limit_exceeded, and top up credit for insufficient_credit.

import os, time
from openai import OpenAI, RateLimitError, APIStatusError

client = OpenAI(base_url="https://api.opentoken.kr/v1", api_key=os.environ["OPENTOKEN_API_KEY"])

try:
    resp = client.chat.completions.create(
        model="google/gemini-3-flash",
        messages=[{"role": "user", "content": "Summarize today's standup."}],
    )
    print(resp.choices[0].message.content)
except RateLimitError as e:
    if e.code == "spend_limit_exceeded":
        raise  # wait for the window or raise the limit
    time.sleep(2)  # transient upstream rate limit, retry
except APIStatusError as e:
    if e.status_code == 402 and e.code == "insufficient_credit":
        raise  # top up credit
    raise
import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.opentoken.kr/v1",
  apiKey: process.env.OPENTOKEN_API_KEY,
});

try {
  const resp = await client.chat.completions.create({
    model: "google/gemini-3-flash",
    messages: [{ role: "user", content: "Summarize today's standup." }],
  });
  console.log(resp.choices[0].message.content);
} catch (e: any) {
  if (e.code === "insufficient_credit") throw e; // top up credit
  if (e.code === "spend_limit_exceeded") throw e; // wait for the window or raise the limit
  // otherwise back off and retry
}
curl https://api.opentoken.kr/v1/chat/completions \
  -H "Authorization: Bearer $OPENTOKEN_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "google/gemini-3-flash",
    "messages": [{"role": "user", "content": "Summarize today'\''s standup."}]
  }'

Every successful request writes an immutable usage record and a USD credit-ledger debit. See Usage and billing to read your spend and confirm what counts against each limit.