Create chat completion
OpenAI-compatible chat completion endpoint — POST /v1/chat/completions.
POST /v1/chat/completions
Creates a model response for a chat conversation. Returns a buffered completion, or a stream of Server-Sent Events when stream is true.
Request body
Prop
Type
Response
{
"id": "chatcmpl-...",
"object": "chat.completion",
"created": 1700000000,
"model": "google/gemini-3-flash",
"choices": [
{
"index": 0,
"message": { "role": "assistant", "content": "Hello" },
"finish_reason": "stop"
}
],
"usage": { "prompt_tokens": 9, "completion_tokens": 12, "total_tokens": 21 }
}The usage object reports prompt_tokens, completion_tokens, and total_tokens. Cache hits appear under prompt_tokens_details.cached_tokens and thinking tokens under completion_tokens_details.reasoning_tokens.
Streamed responses always end with a final chunk that has choices: [] and a
populated usage object, immediately before data: [DONE]. For the full
streaming frame format, see Streaming.