Create embeddings

POST /v1/embeddings

Creates an embedding vector for the input text. Returns a buffered, OpenAI-compatible embeddings object — there is no streaming for this endpoint.

The catalog currently exposes one embedding model, google/text-embedding-004 (768 dimensions by default). Send a single string or a batch of strings and receive one vector per input, in order.

Request

curl https://api.opentoken.kr/v1/embeddings \
  -H "Authorization: Bearer $OPENTOKEN_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "google/text-embedding-004",
    "input": "The quick brown fox jumps over the lazy dog."
  }'

import os
from openai import OpenAI

client = OpenAI(
    base_url="https://api.opentoken.kr/v1",
    api_key=os.environ["OPENTOKEN_API_KEY"],
)

response = client.embeddings.create(
    model="google/text-embedding-004",
    input="The quick brown fox jumps over the lazy dog.",
)

print(response.data[0].embedding[:8])

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.opentoken.kr/v1",
  apiKey: process.env.OPENTOKEN_API_KEY,
});

const response = await client.embeddings.create({
  model: "google/text-embedding-004",
  input: "The quick brown fox jumps over the lazy dog.",
});

console.log(response.data[0].embedding.slice(0, 8));

task_type and dimensions are OpenToken extensions to the OpenAI embeddings surface. OpenAI has no task_type; Google embedding models use it to tune the vector for retrieval, classification, or similarity. For a corpus-plus-query retrieval setup, embed stored documents with RETRIEVAL_DOCUMENT and search queries with RETRIEVAL_QUERY.

Response

The response is a list object whose data array holds one embedding entry per input, each with its zero-based index. The usage object reports input-only token accounting — embeddings produce no completion tokens, so total_tokens equals prompt_tokens.

{
  "object": "list",
  "model": "google/text-embedding-004",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": [0.0023064255, -0.009327292, 0.015797347, "..."]
    }
  ],
  "usage": {
    "prompt_tokens": 11,
    "total_tokens": 11
  }
}

When you pass an array of strings, data contains one entry per string with matching index values (0, 1, …) in request order.

Embeddings are billed input-only, metered at the model's input rate (google/text-embedding-004 is $0.025 / 1M tokens). The OpenAI SDK surfaces a completion_tokens of 0 for embedding usage; total_tokens always equals prompt_tokens. See List models for live pricing.

Errors

The endpoint uses the standard OpenAI-compatible error envelope. Beyond the shared auth and balance errors, two cases are specific to model selection:

A registered model whose provider has no embeddings support returns 400 with code embeddings_unsupported and message model "<id>" does not support embeddings.
An unregistered model id returns 400 with code model_not_found.

{
  "error": {
    "message": "model \"anthropic/claude-sonnet-4-6\" does not support embeddings",
    "type": "invalid_request_error",
    "code": "embeddings_unsupported"
  }
}

A missing or invalid key returns 401 with type authentication_error, and an exhausted balance returns 402 with code insufficient_credit. See Errors for the full list.

Request

Request body

Response

Errors

Next steps

Models

API reference overview

On this page