Session 1: Short-Term and Long-Term Memory in AI Systems

Synopsis

Explains the difference between context-window memory, stored conversation history, user profiles, and persistent knowledge stores. Learners understand how different memory types serve different application needs.

Session Content

Session 1: Short-Term and Long-Term Memory in AI Systems

Session Overview

Duration: ~45 minutes
Audience: Python developers with basic programming knowledge, beginning GenAI and agentic development
Goal: Understand how AI systems use short-term and long-term memory, why memory matters in agentic applications, and how to implement simple memory patterns using the OpenAI Responses API and Python.

Learning Objectives

By the end of this session, learners will be able to:

Explain the difference between short-term and long-term memory in AI systems.
Describe how memory improves multi-turn conversations and agentic workflows.
Recognize common memory design patterns in GenAI systems.
Build a simple short-term conversational memory system in Python.
Build a basic long-term memory store using local persistence.
Reason about when to store, retrieve, summarize, or forget information.

Agenda

Why memory matters in AI systems
Short-term memory concepts
Long-term memory concepts
Memory design patterns for agents
Hands-on Exercise 1: Short-term memory with conversation history
Hands-on Exercise 2: Long-term memory with a local JSON memory store
Best practices and pitfalls
Wrap-up

1. Why Memory Matters in AI Systems

Large language models are powerful at generating responses, but they do not inherently maintain persistent user-specific memory across all interactions unless we explicitly provide it.

Without memory, AI systems:

Forget prior turns in a conversation
Lose user preferences
Repeat questions
Struggle with long-running tasks
Fail to adapt over time

With memory, AI systems can:

Continue multi-turn conversations coherently
Personalize responses
Track goals, constraints, and preferences
Support long-running workflows
Act more like useful assistants or agents

Examples

No memory

User: “I’m vegetarian.”
Later: “Can you suggest dinner?”
Assistant: “How about grilled chicken?”

With memory

User: “I’m vegetarian.”
Later: “Can you suggest dinner?”
Assistant: “Sure — how about a lentil curry, veggie tacos, or mushroom pasta?”

2. Short-Term Memory Concepts

Short-term memory refers to the information an AI system uses during the current interaction or session.

Typical short-term memory includes:

Recent user messages
Recent assistant responses
Current task state
Temporary goals or constraints
Scratchpad-style context built by the application

Characteristics

Session-scoped
Usually small and recent
Often passed directly in each model call
May be summarized when it becomes too large

Common forms

A. Raw conversation history

Store the recent turns exactly as they occurred.

B. Rolling window

Keep only the last N messages or last N turns.

C. Summarized history

Replace older details with a compact summary.

Why short-term memory matters

It helps maintain continuity in:

Customer support chats
Coding copilots
Research assistants
Booking flows
Agent plans and tool usage

3. Long-Term Memory Concepts

Long-term memory is information stored beyond a single interaction and reused in future sessions.

Typical long-term memory includes:

User preferences
Project context
Past decisions
Important facts learned over time
Task outcomes
Saved documents, notes, or embeddings

Characteristics

Persistent across sessions
Retrieved selectively
Usually stored outside the model
Can be structured, unstructured, or vector-based

Examples

“User prefers concise explanations”
“Project uses FastAPI and PostgreSQL”
“Last week’s trip planning included Kyoto and Osaka”
“This customer’s product key is linked to Account A”

Long-term memory storage options

JSON files
SQLite/PostgreSQL
Vector databases
Document stores
Knowledge graphs
CRM / application databases

4. Memory Design Patterns for Agents

Agentic systems often need more than just chat history. They need memory policies.

Pattern 1: Keep recent context in the prompt

Useful for active short conversations.

Pros: - Easy to implement - Reliable - Transparent

Cons: - Prompt grows quickly - Can become expensive - Context window is limited

Pattern 2: Summarize old context

Compress earlier interaction into a shorter form.

Pros: - Saves tokens - Preserves key ideas

Cons: - Can lose nuance - Summary quality matters

Pattern 3: Save important facts explicitly

Extract durable facts and store them separately.

Examples: - Preferred programming language: Python - Dietary preference: vegetarian - Tone preference: concise

Pros: - Efficient retrieval - Easy personalization

Cons: - Requires fact extraction logic - Risk of storing incorrect assumptions

Pattern 4: Retrieve relevant memories on demand

Search stored memory and inject only relevant items into the current prompt.

Pros: - Scales better - More targeted

Cons: - Retrieval quality is critical - More infrastructure required

Pattern 5: Forget aggressively

Not everything should be stored.

Do not store by default: - Sensitive data unless necessary and permitted - Temporary noise - Repeated trivial details - Low-confidence inferences

5. Designing Good Memory Policies

A useful memory system answers these questions:

What should be remembered?

Stable preferences
Important constraints
Long-term goals
Relevant factual context

What should stay short-term only?

Temporary decisions
Intermediate tool results
One-off clarifications
Working notes

When should we summarize?

When prompt size grows too large
When earlier details matter only broadly
When moving between workflow stages

When should we retrieve?

At the start of a new session
Before taking an action
When user asks something linked to past context

When should we forget?

Information is outdated
User requests deletion
Memory is irrelevant
Memory is low confidence

6. Hands-on Exercise 1: Build Short-Term Memory with Conversation History

Objective

Create a simple chatbot that keeps recent conversation turns in memory and sends them to the OpenAI Responses API so the model can answer in context.

What learners will practice

Installing and using the OpenAI Python SDK
Structuring chat input for the Responses API
Maintaining a rolling conversation history
Limiting memory size

Step 1: Install dependencies

pip install openai python-dotenv

Step 2: Set your API key

Create a .env file:

OPENAI_API_KEY=your_api_key_here

Step 3: Python script for short-term memory

"""
short_term_memory_chat.py

A simple conversational chatbot that keeps short-term memory by storing
recent conversation turns and sending them to the OpenAI Responses API.

Requirements:
    pip install openai python-dotenv

Environment:
    OPENAI_API_KEY must be set, e.g. in a .env file.
"""

from openai import OpenAI
from dotenv import load_dotenv
import os


# Load environment variables from .env
load_dotenv()

# Create the OpenAI client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])


def build_input_messages(system_prompt: str, history: list[dict]) -> list[dict]:
    """
    Convert internal message history into the Responses API input format.

    Parameters:
        system_prompt: High-level instructions for the assistant.
        history: List of messages like:
                 {"role": "user"|"assistant", "content": "..."}

    Returns:
        A list of input items compatible with the Responses API.
    """
    input_items = [
        {
            "role": "system",
            "content": [{"type": "input_text", "text": system_prompt}],
        }
    ]

    for msg in history:
        input_items.append(
            {
                "role": msg["role"],
                "content": [{"type": "input_text", "text": msg["content"]}],
            }
        )

    return input_items


def get_assistant_reply(system_prompt: str, history: list[dict]) -> str:
    """
    Send the conversation history to the model and return the assistant's reply.
    """
    response = client.responses.create(
        model="gpt-5.4-mini",
        input=build_input_messages(system_prompt, history),
    )

    return response.output_text


def main() -> None:
    """
    Run a terminal chat app with rolling short-term memory.
    """
    print("Short-Term Memory Chatbot")
    print("Type 'exit' to quit.\n")

    system_prompt = (
        "You are a helpful AI assistant. Keep responses clear and friendly. "
        "Use the conversation history to maintain continuity."
    )

    # Internal conversation history
    history: list[dict] = []

    # Keep only the last 6 messages (3 user-assistant turns)
    max_messages = 6

    while True:
        user_input = input("You: ").strip()

        if user_input.lower() in {"exit", "quit"}:
            print("Goodbye!")
            break

        # Add the user's new message to memory
        history.append({"role": "user", "content": user_input})

        # Trim history to a rolling window
        history = history[-max_messages:]

        # Get assistant response
        assistant_reply = get_assistant_reply(system_prompt, history)

        print(f"Assistant: {assistant_reply}\n")

        # Add assistant response to memory
        history.append({"role": "assistant", "content": assistant_reply})

        # Trim again after assistant message
        history = history[-max_messages:]


if __name__ == "__main__":
    main()

Example interaction

Short-Term Memory Chatbot
Type 'exit' to quit.

You: My name is Priya and I am learning Python.
Assistant: Nice to meet you, Priya! Python is a great language to learn. What are you focusing on right now?

You: I also like concise explanations.
Assistant: Got it — I’ll keep things concise. What Python topic would you like help with?

You: What do you know about me?
Assistant: You told me your name is Priya, you’re learning Python, and you prefer concise explanations.

Discussion

This chatbot has short-term memory because:

It stores recent messages in history
It sends that history with each model call
The model can answer using prior context

Limitation

If the program exits, memory is lost. That means this is session memory, not persistent memory.

Mini Exercise

Modify the code so that:

It remembers the last 10 messages instead of 6
The assistant responds in bullet points if the user requests “summarize”
A welcome message explains that memory is temporary

7. Hands-on Exercise 2: Build Long-Term Memory with a Local JSON Store

Objective

Create a simple persistent memory system that stores user preferences in a JSON file and injects them into future prompts.

What learners will practice

Reading and writing JSON in Python
Persisting memory across runs
Loading relevant memory before model calls
Using stored memory for personalization

Design

We will store memory like this:

{
  "name": "Priya",
  "preferences": {
    "explanation_style": "concise",
    "favorite_language": "Python",
    "diet": "vegetarian"
  }
}

This is a lightweight long-term memory system.

Step 1: Python script for long-term memory

"""
long_term_memory_chat.py

A simple chatbot that stores long-term user memory in a JSON file
and uses that memory in future conversations.

Requirements:
    pip install openai python-dotenv

Environment:
    OPENAI_API_KEY must be set.
"""

from openai import OpenAI
from dotenv import load_dotenv
import json
import os
from pathlib import Path


# Load environment variables from .env
load_dotenv()

# Create OpenAI client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# File used for persistent memory
MEMORY_FILE = Path("user_memory.json")


def load_memory() -> dict:
    """
    Load persistent memory from disk.

    Returns:
        A dictionary with stored user memory.
    """
    if MEMORY_FILE.exists():
        with MEMORY_FILE.open("r", encoding="utf-8") as f:
            return json.load(f)

    # Default empty memory structure
    return {
        "name": None,
        "preferences": {}
    }


def save_memory(memory: dict) -> None:
    """
    Save persistent memory to disk.
    """
    with MEMORY_FILE.open("w", encoding="utf-8") as f:
        json.dump(memory, f, indent=2, ensure_ascii=False)


def update_memory_from_user_input(memory: dict, user_input: str) -> dict:
    """
    Very simple rule-based memory extraction.

    This is intentionally lightweight for learning purposes.
    In real systems, extraction may be model-assisted or schema-driven.
    """
    text = user_input.lower().strip()

    # Learn the user's name from patterns like "my name is ..."
    if "my name is " in text:
        name = user_input.strip()[text.index("my name is ") + len("my name is "):].strip(" .!")
        if name:
            memory["name"] = name

    # Learn preference for concise explanations
    if "concise" in text:
        memory["preferences"]["explanation_style"] = "concise"

    # Learn preference for detailed explanations
    if "detailed" in text:
        memory["preferences"]["explanation_style"] = "detailed"

    # Learn favorite language
    if "i like python" in text or "favorite language is python" in text:
        memory["preferences"]["favorite_language"] = "Python"

    # Learn vegetarian preference
    if "i am vegetarian" in text or "i'm vegetarian" in text:
        memory["preferences"]["diet"] = "vegetarian"

    return memory


def memory_to_context(memory: dict) -> str:
    """
    Convert stored memory into a textual context block for the model.
    """
    lines = ["Known long-term memory about the user:"]

    if memory.get("name"):
        lines.append(f"- Name: {memory['name']}")

    preferences = memory.get("preferences", {})
    if preferences:
        for key, value in preferences.items():
            lines.append(f"- {key}: {value}")

    if len(lines) == 1:
        lines.append("- No long-term memory stored yet.")

    return "\n".join(lines)


def build_input(system_prompt: str, memory_context: str, user_input: str) -> list[dict]:
    """
    Build the input payload for the Responses API.
    """
    return [
        {
            "role": "system",
            "content": [{"type": "input_text", "text": system_prompt}],
        },
        {
            "role": "system",
            "content": [{"type": "input_text", "text": memory_context}],
        },
        {
            "role": "user",
            "content": [{"type": "input_text", "text": user_input}],
        },
    ]


def get_reply(system_prompt: str, memory: dict, user_input: str) -> str:
    """
    Get an assistant reply using stored long-term memory.
    """
    response = client.responses.create(
        model="gpt-5.4-mini",
        input=build_input(system_prompt, memory_to_context(memory), user_input),
    )
    return response.output_text


def main() -> None:
    """
    Run the chatbot with persistent long-term memory.
    """
    print("Long-Term Memory Chatbot")
    print("Type 'exit' to quit.")
    print("Stored memory is saved in user_memory.json\n")

    system_prompt = (
        "You are a helpful AI assistant. Personalize responses when appropriate "
        "using the stored user memory, but do not invent facts."
    )

    memory = load_memory()

    while True:
        user_input = input("You: ").strip()

        if user_input.lower() in {"exit", "quit"}:
            print("Goodbye!")
            break

        # Update and persist memory before generating a response
        memory = update_memory_from_user_input(memory, user_input)
        save_memory(memory)

        assistant_reply = get_reply(system_prompt, memory, user_input)
        print(f"Assistant: {assistant_reply}\n")


if __name__ == "__main__":
    main()

Example interaction: first run

Long-Term Memory Chatbot
Type 'exit' to quit.
Stored memory is saved in user_memory.json

You: My name is Priya.
Assistant: Nice to meet you, Priya! How can I help today?

You: I like concise explanations.
Assistant: Understood — I’ll keep my answers concise.

You: I am vegetarian.
Assistant: Thanks for letting me know. I’ll keep that in mind for food-related suggestions.

Example `user_memory.json`

{
  "name": "Priya",
  "preferences": {
    "explanation_style": "concise",
    "diet": "vegetarian"
  }
}

Example interaction: second run

Long-Term Memory Chatbot
Type 'exit' to quit.
Stored memory is saved in user_memory.json

You: Can you suggest a quick dinner?
Assistant: Since you’re vegetarian, a quick option could be veggie stir-fry, lentil soup, or a chickpea wrap.

Discussion

This is long-term memory because:

Data is saved to disk
It survives application restarts
Future prompts include remembered facts

Limitation

This system uses simple keyword rules. Real systems often use:

Structured extraction with models
Review or approval steps
Retrieval layers
Memory confidence scores
Deletion and correction flows

8. Guided Reflection: Short-Term vs Long-Term Memory

Feature	Short-Term Memory	Long-Term Memory
Scope	Current session	Across sessions
Storage	In app state / prompt	File, DB, vector store, etc.
Lifetime	Temporary	Persistent
Typical content	Recent turns, active task state	Preferences, facts, decisions
Cost concern	Prompt/token growth	Retrieval/storage complexity
Risk	Losing context if truncated	Stale or incorrect memory

Rule of thumb

Use short-term memory for what the model needs right now.
Use long-term memory for what the system should remember later.

9. Best Practices and Pitfalls

Best Practices

Keep memory minimal and relevant
Store facts explicitly when possible
Summarize older context
Let users correct memory
Separate temporary state from durable facts
Add timestamps or metadata in real systems
Use retrieval instead of dumping all memory into prompts

Pitfalls

Storing too much irrelevant data
Treating guesses as facts
Never forgetting stale information
Injecting all memory into every prompt
Storing sensitive information carelessly
Assuming model outputs are always reliable memory extracts

10. Suggested Extension Activities

If learners finish early, try one of these:

Extension 1: Add timestamps

Store when each memory item was added.

Extension 2: Add correction support

Let the user say: - “Forget my diet preference” - “My name is actually Anika”

Extension 3: Combine short-term and long-term memory

Use: - Rolling conversation history for immediate context - JSON memory for persistent preferences

Extension 4: Store memory entries as records

Instead of one flat JSON object, use:

[
  {
    "type": "preference",
    "key": "diet",
    "value": "vegetarian",
    "source": "user_message",
    "timestamp": "2026-03-22T10:00:00Z"
  }
]

This prepares learners for more realistic agent memory systems.

11. Knowledge Check

Quick Questions

What is short-term memory in an AI system?
Why is long-term memory usually stored outside the model?
When should you summarize conversation history?
Why is it risky to store every user statement as a permanent fact?
What is the difference between retrieval and raw prompt accumulation?

Expected answers

Recent contextual information used within the current interaction or session.
Because persistence, retrieval, and control need to be managed by the application.
When the context becomes too large or older details only need compact preservation.
Some statements may be temporary, sensitive, incorrect, or irrelevant.
Retrieval selects relevant stored memory, while raw accumulation keeps adding full context into the prompt.

12. Wrap-Up

In this session, learners explored:

Why memory is essential in AI systems
The distinction between short-term and long-term memory
Common memory patterns used in agentic applications
How to implement a short-term rolling history chatbot
How to persist simple user memory in a JSON store

These ideas are foundational for building useful AI assistants and agents that can maintain context, personalize behavior, and improve over time.

Useful Resources

OpenAI Responses API migration guide: https://developers.openai.com/api/docs/guides/migrate-to-responses
OpenAI API docs: https://developers.openai.com/api/
OpenAI Python SDK: https://github.com/openai/openai-python
Python json module docs: https://docs.python.org/3/library/json.html
Python pathlib docs: https://docs.python.org/3/library/pathlib.html
python-dotenv: https://pypi.org/project/python-dotenv/

Homework

Build a chatbot that combines both memory types:

Short-term: keep the last 4 conversation turns
Long-term: store user preferences in JSON
Before each model call:
include the recent conversation
include the relevant long-term memory
Add one command:
forget diet

Stretch goal

Implement a memory summary that compresses older chat history into 2–3 bullet points before trimming it.

Back to Chapter | Back to Master Plan | Next Session

Session 1: Short-Term and Long-Term Memory in AI Systems

Synopsis

Session Content

Session 1: Short-Term and Long-Term Memory in AI Systems

Session Overview

Learning Objectives

Agenda

1. Why Memory Matters in AI Systems

Without memory, AI systems:

With memory, AI systems can:

Examples

No memory

With memory

2. Short-Term Memory Concepts

Typical short-term memory includes:

Characteristics

Common forms

A. Raw conversation history

B. Rolling window

C. Summarized history

Why short-term memory matters

3. Long-Term Memory Concepts

Typical long-term memory includes:

Characteristics

Examples

Long-term memory storage options

4. Memory Design Patterns for Agents

Pattern 1: Keep recent context in the prompt

Pattern 2: Summarize old context

Pattern 3: Save important facts explicitly

Pattern 4: Retrieve relevant memories on demand

Pattern 5: Forget aggressively

5. Designing Good Memory Policies

What should be remembered?

What should stay short-term only?

When should we summarize?

When should we retrieve?

When should we forget?

6. Hands-on Exercise 1: Build Short-Term Memory with Conversation History

Objective

What learners will practice

Step 1: Install dependencies

Step 2: Set your API key

Step 3: Python script for short-term memory

Example interaction

Discussion

Limitation

Mini Exercise

7. Hands-on Exercise 2: Build Long-Term Memory with a Local JSON Store

Objective

What learners will practice

Design

Step 1: Python script for long-term memory

Example interaction: first run

Example user_memory.json

Example interaction: second run

Discussion

Limitation

8. Guided Reflection: Short-Term vs Long-Term Memory

Rule of thumb

9. Best Practices and Pitfalls

Best Practices

Pitfalls

10. Suggested Extension Activities

Extension 1: Add timestamps

Extension 2: Add correction support

Extension 3: Combine short-term and long-term memory

Extension 4: Store memory entries as records

11. Knowledge Check

Quick Questions

Expected answers

12. Wrap-Up

Useful Resources

Homework

Stretch goal

Example `user_memory.json`