Create chat completion

POST /v1/chat/completions

Creates a model response for a chat conversation. Returns a buffered completion, or a stream of Server-Sent Events when stream is true.

Response

{
  "id": "chatcmpl-...",
  "object": "chat.completion",
  "created": 1700000000,
  "model": "google/gemini-3-flash",
  "choices": [
    {
      "index": 0,
      "message": { "role": "assistant", "content": "Hello" },
      "finish_reason": "stop"
    }
  ],
  "usage": { "prompt_tokens": 9, "completion_tokens": 12, "total_tokens": 21 }
}

The usage object reports prompt_tokens, completion_tokens, and total_tokens. Cache hits appear under prompt_tokens_details.cached_tokens and thinking tokens under completion_tokens_details.reasoning_tokens.

Streamed responses always end with a final chunk that has choices: [] and a populated usage object, immediately before data: [DONE]. For the full streaming frame format, see Streaming.

Request body

Response

On this page