Embeddings & semantic search
Turn text into vectors so you can search by meaning instead of exact keywords.
Prerequisites
- Calling an LLM API
You will learn
- Explain what an embedding is and what the numbers represent
- Compute similarity between two pieces of text
- Build a tiny semantic search over a set of documents
Keyword search fails when people use different words for the same idea. Someone searches "two-wheeler" and misses every document that says "bike". Embeddings fix this by representing meaning as numbers, so "bike" and "two-wheeler" land close together even though they share no letters. This is the foundation of RAG, which you build next week.
Overview
An embedding is a list of numbers (a vector) that captures the meaning of a piece of text. An embedding model reads your text and outputs a fixed-length vector — say 1,536 numbers. Texts with similar meaning produce vectors that point in similar directions. To search by meaning, you embed everything once, then compare vectors.
Key ideas
Meaning as direction
You do not interpret the individual numbers. What matters is the relationship between vectors. The standard comparison is cosine similarity: it measures the angle between two vectors and returns a score from -1 to 1, where higher means more similar in meaning.
import numpy as np
def cosine_similarity(a, b):
a, b = np.array(a), np.array(b)
return float(np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b)))Embed your documents
Call an embedding model to convert each document into a vector. This is a different, cheaper model than the chat model.
from openai import OpenAI
client = OpenAI()
def embed(text):
result = client.embeddings.create(
model="text-embedding-3-small",
input=text,
)
return result.data[0].embeddingA minimal semantic search
Embed your documents once, embed the query, then rank documents by similarity.
docs = [
"We deliver groceries to your door in 30 minutes.",
"Our app helps you book a two-wheeler ride across the city.",
"Order fresh vegetables and fruits online.",
]
doc_vectors = [embed(d) for d in docs]
query = "I need a bike taxi"
query_vector = embed(query)
ranked = sorted(
zip(docs, doc_vectors),
key=lambda pair: cosine_similarity(query_vector, pair[1]),
reverse=True,
)
print(ranked[0][0]) # the two-wheeler doc, despite no shared keywordsScaling beyond a list
Looping over every vector works for a few hundred documents. Beyond that, use a vector database (such as pgvector, Pinecone, or Chroma) that indexes vectors for fast nearest-neighbour search. The concept is identical — you are just letting a purpose-built store do the comparison at scale.
Quick recap
- An embedding is a vector that encodes meaning; similar meanings point in similar directions.
- Cosine similarity scores how close two vectors are.
- Embed documents once, embed the query, rank by similarity to search by meaning.
- Use a vector database once you outgrow a simple in-memory loop.
- Always embed queries and documents with the same model.