Embeddings & semantic search

Keyword search fails when people use different words for the same idea. Someone searches "two-wheeler" and misses every document that says "bike". Embeddings fix this by representing meaning as numbers, so "bike" and "two-wheeler" land close together even though they share no letters. This is the foundation of RAG, which you build next week.

Overview

An embedding is a list of numbers (a vector) that captures the meaning of a piece of text. An embedding model reads your text and outputs a fixed-length vector — say 1,536 numbers. Texts with similar meaning produce vectors that point in similar directions. To search by meaning, you embed everything once, then compare vectors.

Key ideas

Meaning as direction

You do not interpret the individual numbers. What matters is the relationship between vectors. The standard comparison is cosine similarity: it measures the angle between two vectors and returns a score from -1 to 1, where higher means more similar in meaning.

import numpy as np
 
def cosine_similarity(a, b):
    a, b = np.array(a), np.array(b)
    return float(np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b)))

Embed your documents

Call an embedding model to convert each document into a vector. This is a different, cheaper model than the chat model.

from openai import OpenAI
 
client = OpenAI()
 
def embed(text):
    result = client.embeddings.create(
        model="text-embedding-3-small",
        input=text,
    )
    return result.data[0].embedding

A minimal semantic search

Embed your documents once, embed the query, then rank documents by similarity.

docs = [
    "We deliver groceries to your door in 30 minutes.",
    "Our app helps you book a two-wheeler ride across the city.",
    "Order fresh vegetables and fruits online.",
]
doc_vectors = [embed(d) for d in docs]
 
query = "I need a bike taxi"
query_vector = embed(query)
 
ranked = sorted(
    zip(docs, doc_vectors),
    key=lambda pair: cosine_similarity(query_vector, pair[1]),
    reverse=True,
)
print(ranked[0][0])  # the two-wheeler doc, despite no shared keywords

Scaling beyond a list

Looping over every vector works for a few hundred documents. Beyond that, use a vector database (such as pgvector, Pinecone, or Chroma) that indexes vectors for fast nearest-neighbour search. The concept is identical — you are just letting a purpose-built store do the comparison at scale.

Quick recap

An embedding is a vector that encodes meaning; similar meanings point in similar directions.
Cosine similarity scores how close two vectors are.
Embed documents once, embed the query, rank by similarity to search by meaning.
Use a vector database once you outgrow a simple in-memory loop.
Always embed queries and documents with the same model.

Overview

Key ideas

import numpy as np
 
def cosine_similarity(a, b):
    a, b = np.array(a), np.array(b)
    return float(np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b)))

Call an embedding model to convert each document into a vector. This is a different, cheaper model than the chat model.

from openai import OpenAI
 
client = OpenAI()
 
def embed(text):
    result = client.embeddings.create(
        model="text-embedding-3-small",
        input=text,
    )
    return result.data[0].embedding

Embed your documents once, embed the query, then rank documents by similarity.

docs = [
    "We deliver groceries to your door in 30 minutes.",
    "Our app helps you book a two-wheeler ride across the city.",
    "Order fresh vegetables and fruits online.",
]
doc_vectors = [embed(d) for d in docs]
 
query = "I need a bike taxi"
query_vector = embed(query)
 
ranked = sorted(
    zip(docs, doc_vectors),
    key=lambda pair: cosine_similarity(query_vector, pair[1]),
    reverse=True,
)
print(ranked[0][0])  # the two-wheeler doc, despite no shared keywords

Quick recap

An embedding is a vector that encodes meaning; similar meanings point in similar directions.

Cosine similarity scores how close two vectors are.

Embed documents once, embed the query, rank by similarity to search by meaning.

Use a vector database once you outgrow a simple in-memory loop.

Always embed queries and documents with the same model.