Session 1: Short-Term and Long-Term Memory in AI Systems
Synopsis
Explains the difference between context-window memory, stored conversation history, user profiles, and persistent knowledge stores. Learners understand how different memory types serve different application needs.
Session Content
Session 1: Short-Term and Long-Term Memory in AI Systems
Session Overview
Duration: ~45 minutes
Audience: Python developers with basic programming knowledge, beginning GenAI and agentic development
Goal: Understand how AI systems use short-term and long-term memory, why memory matters in agentic applications, and how to implement simple memory patterns using the OpenAI Responses API and Python.
Learning Objectives
By the end of this session, learners will be able to:
- Explain the difference between short-term and long-term memory in AI systems.
- Describe how memory improves multi-turn conversations and agentic workflows.
- Recognize common memory design patterns in GenAI systems.
- Build a simple short-term conversational memory system in Python.
- Build a basic long-term memory store using local persistence.
- Reason about when to store, retrieve, summarize, or forget information.
Agenda
- Why memory matters in AI systems
- Short-term memory concepts
- Long-term memory concepts
- Memory design patterns for agents
- Hands-on Exercise 1: Short-term memory with conversation history
- Hands-on Exercise 2: Long-term memory with a local JSON memory store
- Best practices and pitfalls
- Wrap-up
1. Why Memory Matters in AI Systems
Large language models are powerful at generating responses, but they do not inherently maintain persistent user-specific memory across all interactions unless we explicitly provide it.
Without memory, AI systems:
- Forget prior turns in a conversation
- Lose user preferences
- Repeat questions
- Struggle with long-running tasks
- Fail to adapt over time
With memory, AI systems can:
- Continue multi-turn conversations coherently
- Personalize responses
- Track goals, constraints, and preferences
- Support long-running workflows
- Act more like useful assistants or agents
Examples
No memory
User: “I’m vegetarian.”
Later: “Can you suggest dinner?”
Assistant: “How about grilled chicken?”
With memory
User: “I’m vegetarian.”
Later: “Can you suggest dinner?”
Assistant: “Sure — how about a lentil curry, veggie tacos, or mushroom pasta?”
2. Short-Term Memory Concepts
Short-term memory refers to the information an AI system uses during the current interaction or session.
Typical short-term memory includes:
- Recent user messages
- Recent assistant responses
- Current task state
- Temporary goals or constraints
- Scratchpad-style context built by the application
Characteristics
- Session-scoped
- Usually small and recent
- Often passed directly in each model call
- May be summarized when it becomes too large
Common forms
A. Raw conversation history
Store the recent turns exactly as they occurred.
B. Rolling window
Keep only the last N messages or last N turns.
C. Summarized history
Replace older details with a compact summary.
Why short-term memory matters
It helps maintain continuity in:
- Customer support chats
- Coding copilots
- Research assistants
- Booking flows
- Agent plans and tool usage
3. Long-Term Memory Concepts
Long-term memory is information stored beyond a single interaction and reused in future sessions.
Typical long-term memory includes:
- User preferences
- Project context
- Past decisions
- Important facts learned over time
- Task outcomes
- Saved documents, notes, or embeddings
Characteristics
- Persistent across sessions
- Retrieved selectively
- Usually stored outside the model
- Can be structured, unstructured, or vector-based
Examples
- “User prefers concise explanations”
- “Project uses FastAPI and PostgreSQL”
- “Last week’s trip planning included Kyoto and Osaka”
- “This customer’s product key is linked to Account A”
Long-term memory storage options
- JSON files
- SQLite/PostgreSQL
- Vector databases
- Document stores
- Knowledge graphs
- CRM / application databases
4. Memory Design Patterns for Agents
Agentic systems often need more than just chat history. They need memory policies.
Pattern 1: Keep recent context in the prompt
Useful for active short conversations.
Pros: - Easy to implement - Reliable - Transparent
Cons: - Prompt grows quickly - Can become expensive - Context window is limited
Pattern 2: Summarize old context
Compress earlier interaction into a shorter form.
Pros: - Saves tokens - Preserves key ideas
Cons: - Can lose nuance - Summary quality matters
Pattern 3: Save important facts explicitly
Extract durable facts and store them separately.
Examples: - Preferred programming language: Python - Dietary preference: vegetarian - Tone preference: concise
Pros: - Efficient retrieval - Easy personalization
Cons: - Requires fact extraction logic - Risk of storing incorrect assumptions
Pattern 4: Retrieve relevant memories on demand
Search stored memory and inject only relevant items into the current prompt.
Pros: - Scales better - More targeted
Cons: - Retrieval quality is critical - More infrastructure required
Pattern 5: Forget aggressively
Not everything should be stored.
Do not store by default: - Sensitive data unless necessary and permitted - Temporary noise - Repeated trivial details - Low-confidence inferences
5. Designing Good Memory Policies
A useful memory system answers these questions:
What should be remembered?
- Stable preferences
- Important constraints
- Long-term goals
- Relevant factual context
What should stay short-term only?
- Temporary decisions
- Intermediate tool results
- One-off clarifications
- Working notes
When should we summarize?
- When prompt size grows too large
- When earlier details matter only broadly
- When moving between workflow stages
When should we retrieve?
- At the start of a new session
- Before taking an action
- When user asks something linked to past context
When should we forget?
- Information is outdated
- User requests deletion
- Memory is irrelevant
- Memory is low confidence
6. Hands-on Exercise 1: Build Short-Term Memory with Conversation History
Objective
Create a simple chatbot that keeps recent conversation turns in memory and sends them to the OpenAI Responses API so the model can answer in context.
What learners will practice
- Installing and using the OpenAI Python SDK
- Structuring chat input for the Responses API
- Maintaining a rolling conversation history
- Limiting memory size
Step 1: Install dependencies
pip install openai python-dotenv
Step 2: Set your API key
Create a .env file:
OPENAI_API_KEY=your_api_key_here
Step 3: Python script for short-term memory
"""
short_term_memory_chat.py
A simple conversational chatbot that keeps short-term memory by storing
recent conversation turns and sending them to the OpenAI Responses API.
Requirements:
pip install openai python-dotenv
Environment:
OPENAI_API_KEY must be set, e.g. in a .env file.
"""
from openai import OpenAI
from dotenv import load_dotenv
import os
# Load environment variables from .env
load_dotenv()
# Create the OpenAI client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
def build_input_messages(system_prompt: str, history: list[dict]) -> list[dict]:
"""
Convert internal message history into the Responses API input format.
Parameters:
system_prompt: High-level instructions for the assistant.
history: List of messages like:
{"role": "user"|"assistant", "content": "..."}
Returns:
A list of input items compatible with the Responses API.
"""
input_items = [
{
"role": "system",
"content": [{"type": "input_text", "text": system_prompt}],
}
]
for msg in history:
input_items.append(
{
"role": msg["role"],
"content": [{"type": "input_text", "text": msg["content"]}],
}
)
return input_items
def get_assistant_reply(system_prompt: str, history: list[dict]) -> str:
"""
Send the conversation history to the model and return the assistant's reply.
"""
response = client.responses.create(
model="gpt-5.4-mini",
input=build_input_messages(system_prompt, history),
)
return response.output_text
def main() -> None:
"""
Run a terminal chat app with rolling short-term memory.
"""
print("Short-Term Memory Chatbot")
print("Type 'exit' to quit.\n")
system_prompt = (
"You are a helpful AI assistant. Keep responses clear and friendly. "
"Use the conversation history to maintain continuity."
)
# Internal conversation history
history: list[dict] = []
# Keep only the last 6 messages (3 user-assistant turns)
max_messages = 6
while True:
user_input = input("You: ").strip()
if user_input.lower() in {"exit", "quit"}:
print("Goodbye!")
break
# Add the user's new message to memory
history.append({"role": "user", "content": user_input})
# Trim history to a rolling window
history = history[-max_messages:]
# Get assistant response
assistant_reply = get_assistant_reply(system_prompt, history)
print(f"Assistant: {assistant_reply}\n")
# Add assistant response to memory
history.append({"role": "assistant", "content": assistant_reply})
# Trim again after assistant message
history = history[-max_messages:]
if __name__ == "__main__":
main()
Example interaction
Short-Term Memory Chatbot
Type 'exit' to quit.
You: My name is Priya and I am learning Python.
Assistant: Nice to meet you, Priya! Python is a great language to learn. What are you focusing on right now?
You: I also like concise explanations.
Assistant: Got it — I’ll keep things concise. What Python topic would you like help with?
You: What do you know about me?
Assistant: You told me your name is Priya, you’re learning Python, and you prefer concise explanations.
Discussion
This chatbot has short-term memory because:
- It stores recent messages in
history - It sends that history with each model call
- The model can answer using prior context
Limitation
If the program exits, memory is lost. That means this is session memory, not persistent memory.
Mini Exercise
Modify the code so that:
- It remembers the last 10 messages instead of 6
- The assistant responds in bullet points if the user requests “summarize”
- A welcome message explains that memory is temporary
7. Hands-on Exercise 2: Build Long-Term Memory with a Local JSON Store
Objective
Create a simple persistent memory system that stores user preferences in a JSON file and injects them into future prompts.
What learners will practice
- Reading and writing JSON in Python
- Persisting memory across runs
- Loading relevant memory before model calls
- Using stored memory for personalization
Design
We will store memory like this:
{
"name": "Priya",
"preferences": {
"explanation_style": "concise",
"favorite_language": "Python",
"diet": "vegetarian"
}
}
This is a lightweight long-term memory system.
Step 1: Python script for long-term memory
"""
long_term_memory_chat.py
A simple chatbot that stores long-term user memory in a JSON file
and uses that memory in future conversations.
Requirements:
pip install openai python-dotenv
Environment:
OPENAI_API_KEY must be set.
"""
from openai import OpenAI
from dotenv import load_dotenv
import json
import os
from pathlib import Path
# Load environment variables from .env
load_dotenv()
# Create OpenAI client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
# File used for persistent memory
MEMORY_FILE = Path("user_memory.json")
def load_memory() -> dict:
"""
Load persistent memory from disk.
Returns:
A dictionary with stored user memory.
"""
if MEMORY_FILE.exists():
with MEMORY_FILE.open("r", encoding="utf-8") as f:
return json.load(f)
# Default empty memory structure
return {
"name": None,
"preferences": {}
}
def save_memory(memory: dict) -> None:
"""
Save persistent memory to disk.
"""
with MEMORY_FILE.open("w", encoding="utf-8") as f:
json.dump(memory, f, indent=2, ensure_ascii=False)
def update_memory_from_user_input(memory: dict, user_input: str) -> dict:
"""
Very simple rule-based memory extraction.
This is intentionally lightweight for learning purposes.
In real systems, extraction may be model-assisted or schema-driven.
"""
text = user_input.lower().strip()
# Learn the user's name from patterns like "my name is ..."
if "my name is " in text:
name = user_input.strip()[text.index("my name is ") + len("my name is "):].strip(" .!")
if name:
memory["name"] = name
# Learn preference for concise explanations
if "concise" in text:
memory["preferences"]["explanation_style"] = "concise"
# Learn preference for detailed explanations
if "detailed" in text:
memory["preferences"]["explanation_style"] = "detailed"
# Learn favorite language
if "i like python" in text or "favorite language is python" in text:
memory["preferences"]["favorite_language"] = "Python"
# Learn vegetarian preference
if "i am vegetarian" in text or "i'm vegetarian" in text:
memory["preferences"]["diet"] = "vegetarian"
return memory
def memory_to_context(memory: dict) -> str:
"""
Convert stored memory into a textual context block for the model.
"""
lines = ["Known long-term memory about the user:"]
if memory.get("name"):
lines.append(f"- Name: {memory['name']}")
preferences = memory.get("preferences", {})
if preferences:
for key, value in preferences.items():
lines.append(f"- {key}: {value}")
if len(lines) == 1:
lines.append("- No long-term memory stored yet.")
return "\n".join(lines)
def build_input(system_prompt: str, memory_context: str, user_input: str) -> list[dict]:
"""
Build the input payload for the Responses API.
"""
return [
{
"role": "system",
"content": [{"type": "input_text", "text": system_prompt}],
},
{
"role": "system",
"content": [{"type": "input_text", "text": memory_context}],
},
{
"role": "user",
"content": [{"type": "input_text", "text": user_input}],
},
]
def get_reply(system_prompt: str, memory: dict, user_input: str) -> str:
"""
Get an assistant reply using stored long-term memory.
"""
response = client.responses.create(
model="gpt-5.4-mini",
input=build_input(system_prompt, memory_to_context(memory), user_input),
)
return response.output_text
def main() -> None:
"""
Run the chatbot with persistent long-term memory.
"""
print("Long-Term Memory Chatbot")
print("Type 'exit' to quit.")
print("Stored memory is saved in user_memory.json\n")
system_prompt = (
"You are a helpful AI assistant. Personalize responses when appropriate "
"using the stored user memory, but do not invent facts."
)
memory = load_memory()
while True:
user_input = input("You: ").strip()
if user_input.lower() in {"exit", "quit"}:
print("Goodbye!")
break
# Update and persist memory before generating a response
memory = update_memory_from_user_input(memory, user_input)
save_memory(memory)
assistant_reply = get_reply(system_prompt, memory, user_input)
print(f"Assistant: {assistant_reply}\n")
if __name__ == "__main__":
main()
Example interaction: first run
Long-Term Memory Chatbot
Type 'exit' to quit.
Stored memory is saved in user_memory.json
You: My name is Priya.
Assistant: Nice to meet you, Priya! How can I help today?
You: I like concise explanations.
Assistant: Understood — I’ll keep my answers concise.
You: I am vegetarian.
Assistant: Thanks for letting me know. I’ll keep that in mind for food-related suggestions.
Example user_memory.json
{
"name": "Priya",
"preferences": {
"explanation_style": "concise",
"diet": "vegetarian"
}
}
Example interaction: second run
Long-Term Memory Chatbot
Type 'exit' to quit.
Stored memory is saved in user_memory.json
You: Can you suggest a quick dinner?
Assistant: Since you’re vegetarian, a quick option could be veggie stir-fry, lentil soup, or a chickpea wrap.
Discussion
This is long-term memory because:
- Data is saved to disk
- It survives application restarts
- Future prompts include remembered facts
Limitation
This system uses simple keyword rules. Real systems often use:
- Structured extraction with models
- Review or approval steps
- Retrieval layers
- Memory confidence scores
- Deletion and correction flows
8. Guided Reflection: Short-Term vs Long-Term Memory
| Feature | Short-Term Memory | Long-Term Memory |
|---|---|---|
| Scope | Current session | Across sessions |
| Storage | In app state / prompt | File, DB, vector store, etc. |
| Lifetime | Temporary | Persistent |
| Typical content | Recent turns, active task state | Preferences, facts, decisions |
| Cost concern | Prompt/token growth | Retrieval/storage complexity |
| Risk | Losing context if truncated | Stale or incorrect memory |
Rule of thumb
- Use short-term memory for what the model needs right now.
- Use long-term memory for what the system should remember later.
9. Best Practices and Pitfalls
Best Practices
- Keep memory minimal and relevant
- Store facts explicitly when possible
- Summarize older context
- Let users correct memory
- Separate temporary state from durable facts
- Add timestamps or metadata in real systems
- Use retrieval instead of dumping all memory into prompts
Pitfalls
- Storing too much irrelevant data
- Treating guesses as facts
- Never forgetting stale information
- Injecting all memory into every prompt
- Storing sensitive information carelessly
- Assuming model outputs are always reliable memory extracts
10. Suggested Extension Activities
If learners finish early, try one of these:
Extension 1: Add timestamps
Store when each memory item was added.
Extension 2: Add correction support
Let the user say: - “Forget my diet preference” - “My name is actually Anika”
Extension 3: Combine short-term and long-term memory
Use: - Rolling conversation history for immediate context - JSON memory for persistent preferences
Extension 4: Store memory entries as records
Instead of one flat JSON object, use:
[
{
"type": "preference",
"key": "diet",
"value": "vegetarian",
"source": "user_message",
"timestamp": "2026-03-22T10:00:00Z"
}
]
This prepares learners for more realistic agent memory systems.
11. Knowledge Check
Quick Questions
- What is short-term memory in an AI system?
- Why is long-term memory usually stored outside the model?
- When should you summarize conversation history?
- Why is it risky to store every user statement as a permanent fact?
- What is the difference between retrieval and raw prompt accumulation?
Expected answers
- Recent contextual information used within the current interaction or session.
- Because persistence, retrieval, and control need to be managed by the application.
- When the context becomes too large or older details only need compact preservation.
- Some statements may be temporary, sensitive, incorrect, or irrelevant.
- Retrieval selects relevant stored memory, while raw accumulation keeps adding full context into the prompt.
12. Wrap-Up
In this session, learners explored:
- Why memory is essential in AI systems
- The distinction between short-term and long-term memory
- Common memory patterns used in agentic applications
- How to implement a short-term rolling history chatbot
- How to persist simple user memory in a JSON store
These ideas are foundational for building useful AI assistants and agents that can maintain context, personalize behavior, and improve over time.
Useful Resources
- OpenAI Responses API migration guide: https://developers.openai.com/api/docs/guides/migrate-to-responses
- OpenAI API docs: https://developers.openai.com/api/
- OpenAI Python SDK: https://github.com/openai/openai-python
- Python
jsonmodule docs: https://docs.python.org/3/library/json.html - Python
pathlibdocs: https://docs.python.org/3/library/pathlib.html - python-dotenv: https://pypi.org/project/python-dotenv/
Homework
Build a chatbot that combines both memory types:
- Short-term: keep the last 4 conversation turns
- Long-term: store user preferences in JSON
- Before each model call:
- include the recent conversation
- include the relevant long-term memory
- Add one command:
forget diet
Stretch goal
Implement a memory summary that compresses older chat history into 2–3 bullet points before trimming it.