0
Act 3

Application

3 / 9

Vector Search & Similarity

Act 3 · ~4 min

Theory

After chunks are embedded, they land in a vector index where retrieval means finding the K vectors closest to the query vector.

Similarity metrics:

MetricMeasuresBest for
CosineAngle between vectors (ignores magnitude)Text embeddings — most common
Dot productCosine × magnitudePre-normalized embeddings
EuclideanStraight-line distanceImage / numeric embeddings

Brute-force search is O(N·D) — scanning every vector across every dimension. At scale this is impractical. Approximate Nearest Neighbor (ANN) trades exactness for speed; results are approximately correct most of the time. Three algorithms: HNSW (layered graph — fast, default in Qdrant), IVF (partitions space into cells), PQ (compresses vectors to save memory).

Vector stores manage indexing and ANN automatically:

TypeExamplesTrade-off
Purpose-builtQdrant, Pinecone, WeaviateMetadata filters, scaling
DB extensionpgvectorSimpler ops, slower at scale

Top-K — typically 5–20 — controls how many chunks return per query. Too low drops relevant results; too high floods the LLM context. When vector similarity alone misses keyword matches, hybrid search fills the gap.