OpenToken Docs

Usage and billing

Read token usage from every response and understand how OpenToken meters and bills each request.

Every chat completion returns a usage object that reports token counts for the request. OpenToken uses the same counts to write an immutable usage record and a USD credit-ledger debit, so what you read in the response is what you are billed for.

The usage object

Non-stream responses always include usage. Streaming responses always end with a final chunk whose choices is [] and whose usage is populated — you do not need to (and cannot) request it via stream_options; OpenToken emits usage on every stream automatically.

{
  "usage": {
    "prompt_tokens": 1532,
    "completion_tokens": 418,
    "total_tokens": 1950,
    "prompt_tokens_details": {
      "cached_tokens": 1280,
      "cache_creation_tokens": 0
    },
    "completion_tokens_details": {
      "reasoning_tokens": 192
    }
  }
}

Fields

Prop

Type

Reasoning tokens are part of completion_tokens, and cached tokens are part of prompt_tokens. The detail objects break the totals down; they do not add to them. See reasoning and prompt caching for how to control these.

How billing works

Each billed request writes two records:

An immutable usage record

The token counts above are stored exactly as returned. Usage records are append-only — they are never edited or back-dated, so your history is an audit trail.

A USD credit-ledger debit

The same usage is priced per model and written as a debit, in USD, against your workspace credit ledger. Cached tokens are billed at the cache-read rate and cache creation is metered once via cache_creation_tokens.

Your balance is a signed sum of the ledger: credits added are positive entries and request debits are negative entries. The current balance is the total of all entries, so there is no separately stored counter that can drift from the records.

When the balance reaches zero a request returns 402 with error.type invalid_request_error and error.code insufficient_credit. Per-key spend ceilings return 429 with error.type invalid_request_error and error.code spend_limit_exceeded — inspect error.code (not error.type) to distinguish this case. See limits.

Next steps