Embeddings API when you want retrieval, semantic search, ranking, or a RAG pipeline.
Typical retrieval flow
Chunk the source material
Split documents into chunks that are small enough for retrieval but still
semantically coherent.
Store vectors and metadata
Keep the embeddings together with metadata such as document ID, title, or
section so you can filter and cite later.
Embed the user query
Use the same embedding model for the query that you used for the indexed
chunks.
Filter or rerank if needed
Narrow the candidate set before generation when your pipeline needs better
precision.
Responses API for the generation step and Embeddings API for the retrieval step.
Good defaults
- Keep chunks semantically coherent instead of embedding entire long documents.
- Store metadata with each vector so you can filter, cite, or deduplicate later.
- Keep a stable embedding model per index version.
- Retrieve a small candidate set before you generate the final answer.
Common Pitfalls
- mixing embeddings from different models in one index
- not versioning your embedding model choice
- embedding documents and queries with different incompatible models
- indexing chunks that are too large or too small for your retrieval goals
- skipping evaluation, so retrieval quality degrades without being noticed
Retrieval checklist
- Choose one embedding model for both documents and queries.
- Re-embed the corpus when you change embedding models.
- Measure retrieval quality on a small set of known-good queries.
- Keep generation and retrieval concerns separate.