Session 2: Embeddings and Vector Databases
Synopsis
Introduces semantic embeddings, similarity search, and the role of vector stores in knowledge retrieval. Learners gain the conceptual basis needed to represent documents and query them effectively.
Session Content
Session 2: Embeddings and Vector Databases
Session Overview
In this session, you will learn how embeddings turn text into numeric vectors, why vector similarity matters for GenAI applications, and how vector databases help you retrieve semantically relevant information at scale. You will also build a small semantic search system in Python using the OpenAI Python SDK and the Responses API for generation.
Learning Objectives
By the end of this session, you should be able to:
- Explain what embeddings are and why they are useful
- Describe how semantic similarity differs from keyword matching
- Understand the role of vector databases in retrieval systems
- Generate embeddings for text using Python
- Store and search vectors in a simple in-memory index
- Combine retrieval with generation using the OpenAI Responses API
Prerequisites
- Basic Python knowledge
- Familiarity with API usage and environment variables
- OpenAI API key configured in your environment
Agenda for a 45-Minute Session
- 0–10 min: Theory: embeddings and semantic similarity
- 10–20 min: Theory: vector databases and retrieval workflows
- 20–30 min: Hands-on: create embeddings and compute similarity
- 30–40 min: Hands-on: build semantic search over documents
- 40–45 min: Wrap-up, discussion, and next steps
1. What Are Embeddings?
Embeddings are dense numerical representations of data such as text, code, or images. For text, an embedding maps a sentence or document to a list of floating-point numbers. These numbers capture semantic meaning.
Intuition
Texts with similar meaning tend to have embeddings that are close together in vector space.
For example:
"How do I reset my password?""I forgot my account password, how can I change it?"
These may use different words, but they express a very similar intent. Their embeddings should therefore be close.
Why Embeddings Matter
Embeddings are useful for:
- Semantic search
- Recommendation systems
- Retrieval-augmented generation (RAG)
- Clustering documents by meaning
- Deduplication and similarity detection
- Classification and routing
Keyword Search vs Semantic Search
Keyword search looks for exact words or phrase matches.
Semantic search looks for meaning.
Example:
Query: "ways to lower cloud costs"
A keyword search may miss a document titled:
"Reducing infrastructure spending in distributed systems"
An embedding-based search can still find it if the meanings are close.
2. Embeddings as Vectors
An embedding is a vector:
[0.012, -0.334, 0.918, ...]
You do not usually interpret each individual number directly. What matters is the relationship between vectors.
Similarity Metrics
To compare embeddings, we often use:
- Cosine similarity
- Dot product
- Euclidean distance
For most educational examples, cosine similarity is a great starting point.
Cosine Similarity
Cosine similarity measures how aligned two vectors are.
1.0means very similar direction0.0means unrelated-1.0means opposite direction
Formula:
cosine_similarity(a, b) = dot(a, b) / (||a|| * ||b||)
3. What Is a Vector Database?
A vector database stores embeddings efficiently and supports similarity search over them.
Why Not Just Use a Normal Database?
Traditional databases are excellent at:
- Exact matching
- Filtering structured data
- Transactions
But vector search requires efficient nearest-neighbor lookup in high-dimensional space.
What a Vector Database Typically Provides
- Storage for embeddings and metadata
- Similarity search
- Filtering by metadata
- Indexing for fast approximate nearest neighbor search
- Updates and deletions
- Scalability for large collections
Common Workflow
- Split source content into chunks
- Generate embeddings for each chunk
- Store vectors and metadata
- Embed the user query
- Find the nearest chunks
- Return the chunks directly or pass them to an LLM
This is the core retrieval pipeline behind many RAG systems.
4. Embeddings in GenAI Systems
Embeddings are central to retrieval-based architectures.
Example Use Cases
A. FAQ Search
You embed all FAQs and retrieve the closest answer to a user question.
B. Internal Documentation Assistant
You embed docs, policies, and runbooks, then retrieve the most relevant sections before asking an LLM to answer.
C. Code Search
You embed code snippets or function descriptions and search by natural language.
Important Design Choice: Chunking
Long documents are usually split into smaller chunks before embedding.
Why?
- Smaller chunks often represent one clear idea
- Retrieval is more precise
- LAG/answer generation gets cleaner context
Chunking strategies include:
- Fixed-size chunks
- Sentence-based chunks
- Paragraph-based chunks
- Sliding windows with overlap
5. Hands-On Exercise 1: Generate Embeddings and Compare Similarity
Goal
Generate embeddings for a small set of sentences and compute cosine similarity in Python.
Setup
Install dependencies:
pip install openai numpy python-dotenv
Create a .env file:
OPENAI_API_KEY=your_api_key_here
Code: Embedding Similarity Demo
"""
Session 2 - Exercise 1
Generate embeddings for text and compare semantic similarity.
Requirements:
pip install openai numpy python-dotenv
Environment:
OPENAI_API_KEY must be set in your environment or .env file.
"""
from dotenv import load_dotenv
from openai import OpenAI
import numpy as np
# Load environment variables from .env if present.
load_dotenv()
# Create the OpenAI client.
client = OpenAI()
# A small set of sentences to compare.
sentences = [
"How do I reset my password?",
"I forgot my password and need to change it.",
"What is the capital of France?",
"How can I update my account login credentials?",
]
def get_embedding(text: str):
"""
Request an embedding vector for the provided text.
Args:
text: Input text to embed.
Returns:
A list of floats representing the embedding.
"""
response = client.embeddings.create(
model="text-embedding-3-small",
input=text
)
return response.data[0].embedding
def cosine_similarity(vec_a, vec_b) -> float:
"""
Compute cosine similarity between two vectors.
Args:
vec_a: First vector
vec_b: Second vector
Returns:
Cosine similarity as a float
"""
a = np.array(vec_a)
b = np.array(vec_b)
return float(np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b)))
# Generate embeddings for all sentences.
embeddings = {sentence: get_embedding(sentence) for sentence in sentences}
# Compare all pairs.
print("Pairwise cosine similarities:\n")
for i, sentence_a in enumerate(sentences):
for j, sentence_b in enumerate(sentences):
if i < j:
score = cosine_similarity(
embeddings[sentence_a],
embeddings[sentence_b]
)
print(f"{sentence_a!r}")
print(f"{sentence_b!r}")
print(f"Similarity: {score:.4f}\n")
Example Output
Pairwise cosine similarities:
'How do I reset my password?'
'I forgot my password and need to change it.'
Similarity: 0.8452
'How do I reset my password?'
'What is the capital of France?'
Similarity: 0.2217
'How do I reset my password?'
'How can I update my account login credentials?'
Similarity: 0.7314
Discussion
Notice:
- Password-related queries are more similar to each other
- The France question is less similar to account-related questions
- Semantic similarity can connect related phrases even when the wording changes
6. Hands-On Exercise 2: Build a Simple Semantic Search Engine
Goal
Create a tiny semantic search system over a set of documents using embeddings and cosine similarity.
Dataset
We will use a small set of support knowledge base articles.
Code: In-Memory Vector Search
"""
Session 2 - Exercise 2
Build a simple semantic search system using embeddings.
Requirements:
pip install openai numpy python-dotenv
Environment:
OPENAI_API_KEY must be set in your environment or .env file.
"""
from dotenv import load_dotenv
from openai import OpenAI
import numpy as np
load_dotenv()
client = OpenAI()
# Example knowledge base documents.
documents = [
{
"id": "doc1",
"title": "Password Reset Instructions",
"content": "To reset your password, go to account settings and click 'Forgot Password'. "
"A reset link will be sent to your email."
},
{
"id": "doc2",
"title": "Two-Factor Authentication Setup",
"content": "Enable two-factor authentication from the security page to add extra protection "
"to your account."
},
{
"id": "doc3",
"title": "Billing and Invoice Downloads",
"content": "You can download invoices from the billing dashboard under the payment history section."
},
{
"id": "doc4",
"title": "Changing Email Address",
"content": "To change your account email, open profile settings and update your contact email."
},
]
def get_embedding(text: str):
"""
Generate an embedding for a piece of text.
Args:
text: Text to embed.
Returns:
A list of floats representing the embedding.
"""
response = client.embeddings.create(
model="text-embedding-3-small",
input=text
)
return response.data[0].embedding
def cosine_similarity(vec_a, vec_b) -> float:
"""
Compute cosine similarity between two vectors.
Args:
vec_a: First vector
vec_b: Second vector
Returns:
Cosine similarity score
"""
a = np.array(vec_a)
b = np.array(vec_b)
return float(np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b)))
# Precompute embeddings for each document.
for doc in documents:
combined_text = f"{doc['title']}\n{doc['content']}"
doc["embedding"] = get_embedding(combined_text)
def semantic_search(query: str, top_k: int = 2):
"""
Search documents by semantic similarity.
Args:
query: User search query
top_k: Number of top results to return
Returns:
A list of top-k matching documents with similarity scores
"""
query_embedding = get_embedding(query)
scored_docs = []
for doc in documents:
score = cosine_similarity(query_embedding, doc["embedding"])
scored_docs.append({
"id": doc["id"],
"title": doc["title"],
"content": doc["content"],
"score": score
})
scored_docs.sort(key=lambda item: item["score"], reverse=True)
return scored_docs[:top_k]
# Test queries.
queries = [
"I forgot my password",
"Where do I get my invoices?",
"How can I make my account more secure?"
]
for query in queries:
print(f"\nQuery: {query}")
results = semantic_search(query, top_k=2)
for rank, result in enumerate(results, start=1):
print(f" {rank}. {result['title']} (score={result['score']:.4f})")
print(f" {result['content']}")
Example Output
Query: I forgot my password
1. Password Reset Instructions (score=0.8721)
To reset your password, go to account settings and click 'Forgot Password'. A reset link will be sent to your email.
2. Changing Email Address (score=0.5118)
To change your account email, open profile settings and update your contact email.
Query: Where do I get my invoices?
1. Billing and Invoice Downloads (score=0.8327)
You can download invoices from the billing dashboard under the payment history section.
2. Changing Email Address (score=0.4210)
To change your account email, open profile settings and update your contact email.
What You Learned
This is the basic structure of a vector retrieval system:
- Embed your documents once
- Embed incoming queries
- Compute similarity
- Return the nearest results
In production, an actual vector database would replace the in-memory list and linear scan.
7. Hands-On Exercise 3: Retrieval + Generation with the Responses API
Goal
Retrieve relevant documents using embeddings, then ask a model to answer the user using only the retrieved context.
This is a small RAG-style workflow.
Why Use the Responses API Here?
The Responses API is the recommended API for generating outputs with OpenAI models. In this exercise:
- embeddings are used for retrieval
- the Responses API is used for answer generation
Code: Simple Retrieval-Augmented QA
"""
Session 2 - Exercise 3
Use embedding-based retrieval plus the OpenAI Responses API for answer generation.
Requirements:
pip install openai numpy python-dotenv
Environment:
OPENAI_API_KEY must be set in your environment or .env file.
"""
from dotenv import load_dotenv
from openai import OpenAI
import numpy as np
load_dotenv()
client = OpenAI()
documents = [
{
"id": "doc1",
"title": "Password Reset Instructions",
"content": "To reset your password, go to account settings and click 'Forgot Password'. "
"A reset link will be sent to your email."
},
{
"id": "doc2",
"title": "Two-Factor Authentication Setup",
"content": "Enable two-factor authentication from the security page to add extra protection "
"to your account."
},
{
"id": "doc3",
"title": "Billing and Invoice Downloads",
"content": "You can download invoices from the billing dashboard under the payment history section."
},
{
"id": "doc4",
"title": "Changing Email Address",
"content": "To change your account email, open profile settings and update your contact email."
},
]
def get_embedding(text: str):
"""
Generate an embedding vector for the provided text.
"""
response = client.embeddings.create(
model="text-embedding-3-small",
input=text
)
return response.data[0].embedding
def cosine_similarity(vec_a, vec_b) -> float:
"""
Compute cosine similarity between two vectors.
"""
a = np.array(vec_a)
b = np.array(vec_b)
return float(np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b)))
# Precompute document embeddings.
for doc in documents:
combined_text = f"{doc['title']}\n{doc['content']}"
doc["embedding"] = get_embedding(combined_text)
def retrieve(query: str, top_k: int = 2):
"""
Retrieve top-k documents for a query based on embedding similarity.
"""
query_embedding = get_embedding(query)
scored = []
for doc in documents:
score = cosine_similarity(query_embedding, doc["embedding"])
scored.append({**doc, "score": score})
scored.sort(key=lambda item: item["score"], reverse=True)
return scored[:top_k]
def answer_with_retrieval(query: str) -> str:
"""
Retrieve relevant context and generate an answer using the Responses API.
Args:
query: User question
Returns:
Model-generated answer as text
"""
top_docs = retrieve(query, top_k=2)
context_blocks = []
for i, doc in enumerate(top_docs, start=1):
context_blocks.append(
f"Document {i}: {doc['title']}\n{doc['content']}"
)
context = "\n\n".join(context_blocks)
prompt = f"""
You are a helpful support assistant.
Answer the user's question using only the provided context.
If the answer is not in the context, say that you do not have enough information.
User question:
{query}
Context:
{context}
""".strip()
response = client.responses.create(
model="gpt-5.4-mini",
input=prompt
)
return response.output_text
# Demo
user_question = "How do I secure my account and reset my password?"
answer = answer_with_retrieval(user_question)
print("User question:")
print(user_question)
print("\nAnswer:")
print(answer)
Example Output
User question:
How do I secure my account and reset my password?
Answer:
To secure your account, enable two-factor authentication from the security page. To reset your password, go to account settings and click "Forgot Password." A reset link will be sent to your email.
Discussion
This is the core idea behind many modern knowledge assistants:
- embeddings find relevant context
- the LLM synthesizes an answer from that context
Without retrieval, the model may answer less reliably for domain-specific information.
8. From In-Memory Search to Real Vector Databases
Our examples used a Python list and a brute-force loop. That is fine for learning, but not for large-scale systems.
Popular Vector Database Options
Examples include:
- Pinecone
- Weaviate
- Milvus
- Qdrant
- Chroma
- pgvector with PostgreSQL
Features You Will Commonly See
- Approximate nearest neighbor indexing
- Metadata filtering
- Hybrid search
- Multi-tenant storage
- Persistence and replication
- API integrations with application stacks
When to Use a Real Vector Database
Use one when you need:
- Thousands to millions of vectors
- Fast search latency
- Durable storage
- Filtering and indexing
- Production-grade retrieval pipelines
9. Best Practices
1. Embed Consistently
Use the same embedding model for:
- your stored documents
- incoming user queries
Mixing embedding spaces usually gives poor results.
2. Chunk Carefully
Chunks that are too small may lose context.
Chunks that are too large may reduce retrieval precision.
3. Store Metadata
Keep useful metadata such as:
- source
- title
- author
- timestamp
- tags
- permissions
This helps with filtering and traceability.
4. Evaluate Retrieval Quality
Do not assume top results are always good. Test with representative queries.
Questions to ask:
- Are the retrieved chunks relevant?
- Is important information missing?
- Are the chunks too broad or too narrow?
5. Use Retrieval Before Generation
If your application depends on external knowledge, retrieve first and then generate. This improves factual grounding.
10. Common Pitfalls
Poor Chunking
Large or messy chunks can reduce retrieval quality.
No Evaluation Set
If you do not test retrieval with known queries, you cannot measure whether your search is improving.
Ignoring Metadata
Pure semantic search is useful, but filtering by source, date, or user access can be essential.
Treating Similarity Scores as Absolute Truth
A high similarity score is a signal, not a guarantee of correctness.
Over-Retrieving
Passing too many chunks to an LLM can add noise and reduce answer quality.
11. Mini Challenge
Try extending Exercise 2 or 3 with one or more of the following:
- Add 10 more documents and test retrieval quality
- Store category metadata and filter results before similarity scoring
- Split longer documents into chunks
- Return top 3 documents instead of top 2
- Print similarity scores to inspect ranking behavior
- Build a command-line semantic search tool
Suggested Prompt for Experimentation
Try asking:
"How do I download my bill?""I want to improve login security""Can I update the email on my profile?"
Then inspect whether the top-ranked documents are what you expected.
12. Recap
In this session, you learned:
- Embeddings represent text as vectors
- Similar meaning leads to nearby vectors
- Cosine similarity helps compare vectors
- Vector databases enable scalable semantic retrieval
- Embedding-based retrieval is a key building block of RAG systems
- The OpenAI Responses API can be combined with retrieved context for grounded answers
Useful Resources
- OpenAI Embeddings Guide: https://platform.openai.com/docs/guides/embeddings
- OpenAI Responses API Guide: https://developers.openai.com/api/docs/guides/migrate-to-responses
- OpenAI Python SDK: https://github.com/openai/openai-python
- NumPy Documentation: https://numpy.org/doc/
- pgvector: https://github.com/pgvector/pgvector
- Qdrant Documentation: https://qdrant.tech/documentation/
- Weaviate Documentation: https://weaviate.io/developers/weaviate
- Pinecone Documentation: https://docs.pinecone.io/
Suggested Homework
- Build a semantic search script for your own notes or small documentation set.
- Add chunking for documents longer than a paragraph.
- Compare keyword search results with embedding-based search results.
- Use the Responses API to summarize the top retrieved chunks before answering.
- Reflect on when semantic search succeeds and when it fails.
End-of-Session Reflection Questions
- What kinds of problems are embeddings especially good at?
- Why is semantic search better than keyword search for many user queries?
- What limitations did you notice in the small in-memory implementation?
- How would a vector database improve this design?
- Why is retrieval often useful before generation in GenAI systems?
Back to Chapter | Back to Master Plan | Previous Session | Next Session