Embeddings API when you need numeric vector representations of text for semantic search, retrieval, ranking, recommendation, or clustering.
If you need generated text, tools, or multimodal conversation flows, use Responses API instead.
Best Fit
- RAG indexing pipelines
- semantic search over documents or tickets
- clustering or classification workflows
Request Model
Most requests need:modelinput- optionally
dimensions - optionally
encoding_format
Quick Example
Response Anatomy
usage object contains prompt_tokens and total_tokens. Because embeddings do not generate new text, there are no completion tokens to bill for.
Common patterns
- embed one query string for retrieval at request time
- embed batches of document chunks during indexing
- lower transport size with
encoding_format="base64"when needed - request
dimensionsonly if your selected model supports it and your vector store expects it