Embeddings

Act 1 · ~5 min

Theory

An embedding is a vector — a list of floats — produced by an encoder model from text. The encoder is trained so semantically similar inputs produce vectors pointing in similar directions. Similarity ranges from −1 (opposite) to +1 (near-identical).

text"puppy"

encoderembed

vector[0.21, -0.07, …]

puppyanimals

dogcos ≈ 0.9

databasetech

servercos ≈ 0.9

puppy ↔ servercos ≈ 0.1

Embedding space (2D simplified) — semantic neighbors cluster, unrelated terms drift apart.

Keyword search

"store data" only matches docs containing those exact words. Misses paraphrases.

Embedding search

Finds docs about databases, caching, indexing — same meaning, different words.

Model	Dim	Best For
`all-MiniLM-L6-v2`	384	Local / offline RAG
`nomic-embed-text`	768	Balanced quality
`text-embedding-ada-002`	1536	OpenAI-hosted apps

Foundations

Embeddings

Theory