Embeddings
Theory
An embedding is a vector — a list of floats — produced by an encoder model from text. The encoder is trained so semantically similar inputs produce vectors pointing in similar directions. Similarity ranges from −1 (opposite) to +1 (near-identical).
text"puppy"
encoderembed
vector[0.21, -0.07, …]
puppyanimals
dogcos ≈ 0.9
databasetech
servercos ≈ 0.9
puppy ↔ servercos ≈ 0.1
Keyword search
"store data" only matches docs containing those exact words. Misses paraphrases.
Embedding search
Finds docs about databases, caching, indexing — same meaning, different words.
| Model | Dim | Best For |
|---|---|---|
all-MiniLM-L6-v2 | 384 | Local / offline RAG |
nomic-embed-text | 768 | Balanced quality |
text-embedding-ada-002 | 1536 | OpenAI-hosted apps |