Skip to content

Session 2: Embeddings and Vector Databases

Synopsis

Introduces semantic embeddings, similarity search, and the role of vector stores in knowledge retrieval. Learners gain the conceptual basis needed to represent documents and query them effectively.

Session Content

Session 2: Embeddings and Vector Databases

Session Overview

In this session, you will learn how embeddings turn text into numeric vectors, why vector similarity matters for GenAI applications, and how vector databases help you retrieve semantically relevant information at scale. You will also build a small semantic search system in Python using the OpenAI Python SDK and the Responses API for generation.

Learning Objectives

By the end of this session, you should be able to:

  • Explain what embeddings are and why they are useful
  • Describe how semantic similarity differs from keyword matching
  • Understand the role of vector databases in retrieval systems
  • Generate embeddings for text using Python
  • Store and search vectors in a simple in-memory index
  • Combine retrieval with generation using the OpenAI Responses API

Prerequisites

  • Basic Python knowledge
  • Familiarity with API usage and environment variables
  • OpenAI API key configured in your environment

Agenda for a 45-Minute Session

  • 0–10 min: Theory: embeddings and semantic similarity
  • 10–20 min: Theory: vector databases and retrieval workflows
  • 20–30 min: Hands-on: create embeddings and compute similarity
  • 30–40 min: Hands-on: build semantic search over documents
  • 40–45 min: Wrap-up, discussion, and next steps

1. What Are Embeddings?

Embeddings are dense numerical representations of data such as text, code, or images. For text, an embedding maps a sentence or document to a list of floating-point numbers. These numbers capture semantic meaning.

Intuition

Texts with similar meaning tend to have embeddings that are close together in vector space.

For example:

  • "How do I reset my password?"
  • "I forgot my account password, how can I change it?"

These may use different words, but they express a very similar intent. Their embeddings should therefore be close.

Why Embeddings Matter

Embeddings are useful for:

  • Semantic search
  • Recommendation systems
  • Retrieval-augmented generation (RAG)
  • Clustering documents by meaning
  • Deduplication and similarity detection
  • Classification and routing

Keyword search looks for exact words or phrase matches.

Semantic search looks for meaning.

Example:

Query: "ways to lower cloud costs"

A keyword search may miss a document titled:

  • "Reducing infrastructure spending in distributed systems"

An embedding-based search can still find it if the meanings are close.


2. Embeddings as Vectors

An embedding is a vector:

[0.012, -0.334, 0.918, ...]

You do not usually interpret each individual number directly. What matters is the relationship between vectors.

Similarity Metrics

To compare embeddings, we often use:

  • Cosine similarity
  • Dot product
  • Euclidean distance

For most educational examples, cosine similarity is a great starting point.

Cosine Similarity

Cosine similarity measures how aligned two vectors are.

  • 1.0 means very similar direction
  • 0.0 means unrelated
  • -1.0 means opposite direction

Formula:

cosine_similarity(a, b) = dot(a, b) / (||a|| * ||b||)

3. What Is a Vector Database?

A vector database stores embeddings efficiently and supports similarity search over them.

Why Not Just Use a Normal Database?

Traditional databases are excellent at:

  • Exact matching
  • Filtering structured data
  • Transactions

But vector search requires efficient nearest-neighbor lookup in high-dimensional space.

What a Vector Database Typically Provides

  • Storage for embeddings and metadata
  • Similarity search
  • Filtering by metadata
  • Indexing for fast approximate nearest neighbor search
  • Updates and deletions
  • Scalability for large collections

Common Workflow

  1. Split source content into chunks
  2. Generate embeddings for each chunk
  3. Store vectors and metadata
  4. Embed the user query
  5. Find the nearest chunks
  6. Return the chunks directly or pass them to an LLM

This is the core retrieval pipeline behind many RAG systems.


4. Embeddings in GenAI Systems

Embeddings are central to retrieval-based architectures.

Example Use Cases

You embed all FAQs and retrieve the closest answer to a user question.

B. Internal Documentation Assistant

You embed docs, policies, and runbooks, then retrieve the most relevant sections before asking an LLM to answer.

You embed code snippets or function descriptions and search by natural language.

Important Design Choice: Chunking

Long documents are usually split into smaller chunks before embedding.

Why?

  • Smaller chunks often represent one clear idea
  • Retrieval is more precise
  • LAG/answer generation gets cleaner context

Chunking strategies include:

  • Fixed-size chunks
  • Sentence-based chunks
  • Paragraph-based chunks
  • Sliding windows with overlap

5. Hands-On Exercise 1: Generate Embeddings and Compare Similarity

Goal

Generate embeddings for a small set of sentences and compute cosine similarity in Python.

Setup

Install dependencies:

pip install openai numpy python-dotenv

Create a .env file:

OPENAI_API_KEY=your_api_key_here

Code: Embedding Similarity Demo

"""
Session 2 - Exercise 1
Generate embeddings for text and compare semantic similarity.

Requirements:
    pip install openai numpy python-dotenv

Environment:
    OPENAI_API_KEY must be set in your environment or .env file.
"""

from dotenv import load_dotenv
from openai import OpenAI
import numpy as np

# Load environment variables from .env if present.
load_dotenv()

# Create the OpenAI client.
client = OpenAI()

# A small set of sentences to compare.
sentences = [
    "How do I reset my password?",
    "I forgot my password and need to change it.",
    "What is the capital of France?",
    "How can I update my account login credentials?",
]

def get_embedding(text: str):
    """
    Request an embedding vector for the provided text.

    Args:
        text: Input text to embed.

    Returns:
        A list of floats representing the embedding.
    """
    response = client.embeddings.create(
        model="text-embedding-3-small",
        input=text
    )
    return response.data[0].embedding

def cosine_similarity(vec_a, vec_b) -> float:
    """
    Compute cosine similarity between two vectors.

    Args:
        vec_a: First vector
        vec_b: Second vector

    Returns:
        Cosine similarity as a float
    """
    a = np.array(vec_a)
    b = np.array(vec_b)
    return float(np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b)))

# Generate embeddings for all sentences.
embeddings = {sentence: get_embedding(sentence) for sentence in sentences}

# Compare all pairs.
print("Pairwise cosine similarities:\n")

for i, sentence_a in enumerate(sentences):
    for j, sentence_b in enumerate(sentences):
        if i < j:
            score = cosine_similarity(
                embeddings[sentence_a],
                embeddings[sentence_b]
            )
            print(f"{sentence_a!r}")
            print(f"{sentence_b!r}")
            print(f"Similarity: {score:.4f}\n")

Example Output

Pairwise cosine similarities:

'How do I reset my password?'
'I forgot my password and need to change it.'
Similarity: 0.8452

'How do I reset my password?'
'What is the capital of France?'
Similarity: 0.2217

'How do I reset my password?'
'How can I update my account login credentials?'
Similarity: 0.7314

Discussion

Notice:

  • Password-related queries are more similar to each other
  • The France question is less similar to account-related questions
  • Semantic similarity can connect related phrases even when the wording changes

6. Hands-On Exercise 2: Build a Simple Semantic Search Engine

Goal

Create a tiny semantic search system over a set of documents using embeddings and cosine similarity.

Dataset

We will use a small set of support knowledge base articles.

"""
Session 2 - Exercise 2
Build a simple semantic search system using embeddings.

Requirements:
    pip install openai numpy python-dotenv

Environment:
    OPENAI_API_KEY must be set in your environment or .env file.
"""

from dotenv import load_dotenv
from openai import OpenAI
import numpy as np

load_dotenv()
client = OpenAI()

# Example knowledge base documents.
documents = [
    {
        "id": "doc1",
        "title": "Password Reset Instructions",
        "content": "To reset your password, go to account settings and click 'Forgot Password'. "
                   "A reset link will be sent to your email."
    },
    {
        "id": "doc2",
        "title": "Two-Factor Authentication Setup",
        "content": "Enable two-factor authentication from the security page to add extra protection "
                   "to your account."
    },
    {
        "id": "doc3",
        "title": "Billing and Invoice Downloads",
        "content": "You can download invoices from the billing dashboard under the payment history section."
    },
    {
        "id": "doc4",
        "title": "Changing Email Address",
        "content": "To change your account email, open profile settings and update your contact email."
    },
]

def get_embedding(text: str):
    """
    Generate an embedding for a piece of text.

    Args:
        text: Text to embed.

    Returns:
        A list of floats representing the embedding.
    """
    response = client.embeddings.create(
        model="text-embedding-3-small",
        input=text
    )
    return response.data[0].embedding

def cosine_similarity(vec_a, vec_b) -> float:
    """
    Compute cosine similarity between two vectors.

    Args:
        vec_a: First vector
        vec_b: Second vector

    Returns:
        Cosine similarity score
    """
    a = np.array(vec_a)
    b = np.array(vec_b)
    return float(np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b)))

# Precompute embeddings for each document.
for doc in documents:
    combined_text = f"{doc['title']}\n{doc['content']}"
    doc["embedding"] = get_embedding(combined_text)

def semantic_search(query: str, top_k: int = 2):
    """
    Search documents by semantic similarity.

    Args:
        query: User search query
        top_k: Number of top results to return

    Returns:
        A list of top-k matching documents with similarity scores
    """
    query_embedding = get_embedding(query)

    scored_docs = []
    for doc in documents:
        score = cosine_similarity(query_embedding, doc["embedding"])
        scored_docs.append({
            "id": doc["id"],
            "title": doc["title"],
            "content": doc["content"],
            "score": score
        })

    scored_docs.sort(key=lambda item: item["score"], reverse=True)
    return scored_docs[:top_k]

# Test queries.
queries = [
    "I forgot my password",
    "Where do I get my invoices?",
    "How can I make my account more secure?"
]

for query in queries:
    print(f"\nQuery: {query}")
    results = semantic_search(query, top_k=2)

    for rank, result in enumerate(results, start=1):
        print(f"  {rank}. {result['title']} (score={result['score']:.4f})")
        print(f"     {result['content']}")

Example Output

Query: I forgot my password
  1. Password Reset Instructions (score=0.8721)
     To reset your password, go to account settings and click 'Forgot Password'. A reset link will be sent to your email.
  2. Changing Email Address (score=0.5118)
     To change your account email, open profile settings and update your contact email.

Query: Where do I get my invoices?
  1. Billing and Invoice Downloads (score=0.8327)
     You can download invoices from the billing dashboard under the payment history section.
  2. Changing Email Address (score=0.4210)
     To change your account email, open profile settings and update your contact email.

What You Learned

This is the basic structure of a vector retrieval system:

  • Embed your documents once
  • Embed incoming queries
  • Compute similarity
  • Return the nearest results

In production, an actual vector database would replace the in-memory list and linear scan.


7. Hands-On Exercise 3: Retrieval + Generation with the Responses API

Goal

Retrieve relevant documents using embeddings, then ask a model to answer the user using only the retrieved context.

This is a small RAG-style workflow.

Why Use the Responses API Here?

The Responses API is the recommended API for generating outputs with OpenAI models. In this exercise:

  • embeddings are used for retrieval
  • the Responses API is used for answer generation

Code: Simple Retrieval-Augmented QA

"""
Session 2 - Exercise 3
Use embedding-based retrieval plus the OpenAI Responses API for answer generation.

Requirements:
    pip install openai numpy python-dotenv

Environment:
    OPENAI_API_KEY must be set in your environment or .env file.
"""

from dotenv import load_dotenv
from openai import OpenAI
import numpy as np

load_dotenv()
client = OpenAI()

documents = [
    {
        "id": "doc1",
        "title": "Password Reset Instructions",
        "content": "To reset your password, go to account settings and click 'Forgot Password'. "
                   "A reset link will be sent to your email."
    },
    {
        "id": "doc2",
        "title": "Two-Factor Authentication Setup",
        "content": "Enable two-factor authentication from the security page to add extra protection "
                   "to your account."
    },
    {
        "id": "doc3",
        "title": "Billing and Invoice Downloads",
        "content": "You can download invoices from the billing dashboard under the payment history section."
    },
    {
        "id": "doc4",
        "title": "Changing Email Address",
        "content": "To change your account email, open profile settings and update your contact email."
    },
]

def get_embedding(text: str):
    """
    Generate an embedding vector for the provided text.
    """
    response = client.embeddings.create(
        model="text-embedding-3-small",
        input=text
    )
    return response.data[0].embedding

def cosine_similarity(vec_a, vec_b) -> float:
    """
    Compute cosine similarity between two vectors.
    """
    a = np.array(vec_a)
    b = np.array(vec_b)
    return float(np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b)))

# Precompute document embeddings.
for doc in documents:
    combined_text = f"{doc['title']}\n{doc['content']}"
    doc["embedding"] = get_embedding(combined_text)

def retrieve(query: str, top_k: int = 2):
    """
    Retrieve top-k documents for a query based on embedding similarity.
    """
    query_embedding = get_embedding(query)
    scored = []

    for doc in documents:
        score = cosine_similarity(query_embedding, doc["embedding"])
        scored.append({**doc, "score": score})

    scored.sort(key=lambda item: item["score"], reverse=True)
    return scored[:top_k]

def answer_with_retrieval(query: str) -> str:
    """
    Retrieve relevant context and generate an answer using the Responses API.

    Args:
        query: User question

    Returns:
        Model-generated answer as text
    """
    top_docs = retrieve(query, top_k=2)

    context_blocks = []
    for i, doc in enumerate(top_docs, start=1):
        context_blocks.append(
            f"Document {i}: {doc['title']}\n{doc['content']}"
        )

    context = "\n\n".join(context_blocks)

    prompt = f"""
You are a helpful support assistant.
Answer the user's question using only the provided context.
If the answer is not in the context, say that you do not have enough information.

User question:
{query}

Context:
{context}
""".strip()

    response = client.responses.create(
        model="gpt-5.4-mini",
        input=prompt
    )

    return response.output_text

# Demo
user_question = "How do I secure my account and reset my password?"
answer = answer_with_retrieval(user_question)

print("User question:")
print(user_question)
print("\nAnswer:")
print(answer)

Example Output

User question:
How do I secure my account and reset my password?

Answer:
To secure your account, enable two-factor authentication from the security page. To reset your password, go to account settings and click "Forgot Password." A reset link will be sent to your email.

Discussion

This is the core idea behind many modern knowledge assistants:

  • embeddings find relevant context
  • the LLM synthesizes an answer from that context

Without retrieval, the model may answer less reliably for domain-specific information.


8. From In-Memory Search to Real Vector Databases

Our examples used a Python list and a brute-force loop. That is fine for learning, but not for large-scale systems.

Examples include:

  • Pinecone
  • Weaviate
  • Milvus
  • Qdrant
  • Chroma
  • pgvector with PostgreSQL

Features You Will Commonly See

  • Approximate nearest neighbor indexing
  • Metadata filtering
  • Hybrid search
  • Multi-tenant storage
  • Persistence and replication
  • API integrations with application stacks

When to Use a Real Vector Database

Use one when you need:

  • Thousands to millions of vectors
  • Fast search latency
  • Durable storage
  • Filtering and indexing
  • Production-grade retrieval pipelines

9. Best Practices

1. Embed Consistently

Use the same embedding model for:

  • your stored documents
  • incoming user queries

Mixing embedding spaces usually gives poor results.

2. Chunk Carefully

Chunks that are too small may lose context.
Chunks that are too large may reduce retrieval precision.

3. Store Metadata

Keep useful metadata such as:

  • source
  • title
  • author
  • timestamp
  • tags
  • permissions

This helps with filtering and traceability.

4. Evaluate Retrieval Quality

Do not assume top results are always good. Test with representative queries.

Questions to ask:

  • Are the retrieved chunks relevant?
  • Is important information missing?
  • Are the chunks too broad or too narrow?

5. Use Retrieval Before Generation

If your application depends on external knowledge, retrieve first and then generate. This improves factual grounding.


10. Common Pitfalls

Poor Chunking

Large or messy chunks can reduce retrieval quality.

No Evaluation Set

If you do not test retrieval with known queries, you cannot measure whether your search is improving.

Ignoring Metadata

Pure semantic search is useful, but filtering by source, date, or user access can be essential.

Treating Similarity Scores as Absolute Truth

A high similarity score is a signal, not a guarantee of correctness.

Over-Retrieving

Passing too many chunks to an LLM can add noise and reduce answer quality.


11. Mini Challenge

Try extending Exercise 2 or 3 with one or more of the following:

  • Add 10 more documents and test retrieval quality
  • Store category metadata and filter results before similarity scoring
  • Split longer documents into chunks
  • Return top 3 documents instead of top 2
  • Print similarity scores to inspect ranking behavior
  • Build a command-line semantic search tool

Suggested Prompt for Experimentation

Try asking:

  • "How do I download my bill?"
  • "I want to improve login security"
  • "Can I update the email on my profile?"

Then inspect whether the top-ranked documents are what you expected.


12. Recap

In this session, you learned:

  • Embeddings represent text as vectors
  • Similar meaning leads to nearby vectors
  • Cosine similarity helps compare vectors
  • Vector databases enable scalable semantic retrieval
  • Embedding-based retrieval is a key building block of RAG systems
  • The OpenAI Responses API can be combined with retrieved context for grounded answers

Useful Resources

  • OpenAI Embeddings Guide: https://platform.openai.com/docs/guides/embeddings
  • OpenAI Responses API Guide: https://developers.openai.com/api/docs/guides/migrate-to-responses
  • OpenAI Python SDK: https://github.com/openai/openai-python
  • NumPy Documentation: https://numpy.org/doc/
  • pgvector: https://github.com/pgvector/pgvector
  • Qdrant Documentation: https://qdrant.tech/documentation/
  • Weaviate Documentation: https://weaviate.io/developers/weaviate
  • Pinecone Documentation: https://docs.pinecone.io/

Suggested Homework

  1. Build a semantic search script for your own notes or small documentation set.
  2. Add chunking for documents longer than a paragraph.
  3. Compare keyword search results with embedding-based search results.
  4. Use the Responses API to summarize the top retrieved chunks before answering.
  5. Reflect on when semantic search succeeds and when it fails.

End-of-Session Reflection Questions

  • What kinds of problems are embeddings especially good at?
  • Why is semantic search better than keyword search for many user queries?
  • What limitations did you notice in the small in-memory implementation?
  • How would a vector database improve this design?
  • Why is retrieval often useful before generation in GenAI systems?

Back to Chapter | Back to Master Plan | Previous Session | Next Session