Limits

OpenToken enforces two independent gates before a request reaches a provider: a per-key spend limit and a credit-balance pre-check. Each gate maps to a distinct error so you can react to it programmatically. OpenToken does not run its own request-rate limiter today.

Provider rate limits

OpenToken does not rate-limit your requests itself. If the upstream provider throttles a request, that response surfaces through the gateway as type upstream_error with code upstream_error, preserving the provider's HTTP status (typically 429). Back off and retry the request after a short delay.

{
  "error": {
    "message": "...",
    "type": "upstream_error",
    "code": "upstream_error"
  }
}

Per-key spend limits

A key can carry a spend_limit_usd together with a limit_reset period. The gateway tracks USD spend against the key over that window. Once the accumulated spend crosses the limit, further requests on that key return 429 with code spend_limit_exceeded until the window resets.

{
  "error": {
    "message": "api key spend limit exceeded",
    "type": "invalid_request_error",
    "code": "spend_limit_exceeded"
  }
}

spend_limit_exceeded is scoped to a single key. Other keys in the same workspace keep working until they hit their own limits or the shared credit balance runs out.

Balance pre-check

Before dispatching a request, the gateway does a balance pre-check against your workspace credit. When there is no credit left, the request is rejected with 402 and code insufficient_credit — the provider is never called, so you are not billed for the attempt. This pre-check applies to all billed endpoints, including chat completions and embeddings. The per-key spend limit above is currently enforced on chat completions.

{
  "error": {
    "message": "insufficient credit balance",
    "type": "invalid_request_error",
    "code": "insufficient_credit"
  }
}

Handling limits in code

Inspect error.code to decide your response: retry after a delay for an upstream upstream_error, wait for the window or raise the limit for spend_limit_exceeded, and top up credit for insufficient_credit.

import os, time
from openai import OpenAI, RateLimitError, APIStatusError

client = OpenAI(base_url="https://api.opentoken.kr/v1", api_key=os.environ["OPENTOKEN_API_KEY"])

try:
    resp = client.chat.completions.create(
        model="google/gemini-3-flash",
        messages=[{"role": "user", "content": "Summarize today's standup."}],
    )
    print(resp.choices[0].message.content)
except RateLimitError as e:
    if e.code == "spend_limit_exceeded":
        raise  # wait for the window or raise the limit
    time.sleep(2)  # transient upstream rate limit, retry
except APIStatusError as e:
    if e.status_code == 402 and e.code == "insufficient_credit":
        raise  # top up credit
    raise

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.opentoken.kr/v1",
  apiKey: process.env.OPENTOKEN_API_KEY,
});

try {
  const resp = await client.chat.completions.create({
    model: "google/gemini-3-flash",
    messages: [{ role: "user", content: "Summarize today's standup." }],
  });
  console.log(resp.choices[0].message.content);
} catch (e: any) {
  if (e.code === "insufficient_credit") throw e; // top up credit
  if (e.code === "spend_limit_exceeded") throw e; // wait for the window or raise the limit
  // otherwise back off and retry
}

curl https://api.opentoken.kr/v1/chat/completions \
  -H "Authorization: Bearer $OPENTOKEN_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "google/gemini-3-flash",
    "messages": [{"role": "user", "content": "Summarize today'\''s standup."}]
  }'

Every successful request writes an immutable usage record and a USD credit-ledger debit. See Usage and billing to read your spend and confirm what counts against each limit.

Usage and billing

Read usage records and the credit ledger.

Create chat completion

The endpoint these limits gate.