Skip to content

Session 1: Short-Term and Long-Term Memory in AI Systems

Synopsis

Explains the difference between context-window memory, stored conversation history, user profiles, and persistent knowledge stores. Learners understand how different memory types serve different application needs.

Session Content

Session 1: Short-Term and Long-Term Memory in AI Systems

Session Overview

Duration: ~45 minutes
Audience: Python developers with basic programming knowledge, beginning GenAI and agentic development
Goal: Understand how AI systems use short-term and long-term memory, why memory matters in agentic applications, and how to implement simple memory patterns using the OpenAI Responses API and Python.


Learning Objectives

By the end of this session, learners will be able to:

  • Explain the difference between short-term and long-term memory in AI systems.
  • Describe how memory improves multi-turn conversations and agentic workflows.
  • Recognize common memory design patterns in GenAI systems.
  • Build a simple short-term conversational memory system in Python.
  • Build a basic long-term memory store using local persistence.
  • Reason about when to store, retrieve, summarize, or forget information.

Agenda

  1. Why memory matters in AI systems
  2. Short-term memory concepts
  3. Long-term memory concepts
  4. Memory design patterns for agents
  5. Hands-on Exercise 1: Short-term memory with conversation history
  6. Hands-on Exercise 2: Long-term memory with a local JSON memory store
  7. Best practices and pitfalls
  8. Wrap-up

1. Why Memory Matters in AI Systems

Large language models are powerful at generating responses, but they do not inherently maintain persistent user-specific memory across all interactions unless we explicitly provide it.

Without memory, AI systems:

  • Forget prior turns in a conversation
  • Lose user preferences
  • Repeat questions
  • Struggle with long-running tasks
  • Fail to adapt over time

With memory, AI systems can:

  • Continue multi-turn conversations coherently
  • Personalize responses
  • Track goals, constraints, and preferences
  • Support long-running workflows
  • Act more like useful assistants or agents

Examples

No memory

User: “I’m vegetarian.”
Later: “Can you suggest dinner?”
Assistant: “How about grilled chicken?”

With memory

User: “I’m vegetarian.”
Later: “Can you suggest dinner?”
Assistant: “Sure — how about a lentil curry, veggie tacos, or mushroom pasta?”


2. Short-Term Memory Concepts

Short-term memory refers to the information an AI system uses during the current interaction or session.

Typical short-term memory includes:

  • Recent user messages
  • Recent assistant responses
  • Current task state
  • Temporary goals or constraints
  • Scratchpad-style context built by the application

Characteristics

  • Session-scoped
  • Usually small and recent
  • Often passed directly in each model call
  • May be summarized when it becomes too large

Common forms

A. Raw conversation history

Store the recent turns exactly as they occurred.

B. Rolling window

Keep only the last N messages or last N turns.

C. Summarized history

Replace older details with a compact summary.

Why short-term memory matters

It helps maintain continuity in:

  • Customer support chats
  • Coding copilots
  • Research assistants
  • Booking flows
  • Agent plans and tool usage

3. Long-Term Memory Concepts

Long-term memory is information stored beyond a single interaction and reused in future sessions.

Typical long-term memory includes:

  • User preferences
  • Project context
  • Past decisions
  • Important facts learned over time
  • Task outcomes
  • Saved documents, notes, or embeddings

Characteristics

  • Persistent across sessions
  • Retrieved selectively
  • Usually stored outside the model
  • Can be structured, unstructured, or vector-based

Examples

  • “User prefers concise explanations”
  • “Project uses FastAPI and PostgreSQL”
  • “Last week’s trip planning included Kyoto and Osaka”
  • “This customer’s product key is linked to Account A”

Long-term memory storage options

  • JSON files
  • SQLite/PostgreSQL
  • Vector databases
  • Document stores
  • Knowledge graphs
  • CRM / application databases

4. Memory Design Patterns for Agents

Agentic systems often need more than just chat history. They need memory policies.

Pattern 1: Keep recent context in the prompt

Useful for active short conversations.

Pros: - Easy to implement - Reliable - Transparent

Cons: - Prompt grows quickly - Can become expensive - Context window is limited


Pattern 2: Summarize old context

Compress earlier interaction into a shorter form.

Pros: - Saves tokens - Preserves key ideas

Cons: - Can lose nuance - Summary quality matters


Pattern 3: Save important facts explicitly

Extract durable facts and store them separately.

Examples: - Preferred programming language: Python - Dietary preference: vegetarian - Tone preference: concise

Pros: - Efficient retrieval - Easy personalization

Cons: - Requires fact extraction logic - Risk of storing incorrect assumptions


Pattern 4: Retrieve relevant memories on demand

Search stored memory and inject only relevant items into the current prompt.

Pros: - Scales better - More targeted

Cons: - Retrieval quality is critical - More infrastructure required


Pattern 5: Forget aggressively

Not everything should be stored.

Do not store by default: - Sensitive data unless necessary and permitted - Temporary noise - Repeated trivial details - Low-confidence inferences


5. Designing Good Memory Policies

A useful memory system answers these questions:

What should be remembered?

  • Stable preferences
  • Important constraints
  • Long-term goals
  • Relevant factual context

What should stay short-term only?

  • Temporary decisions
  • Intermediate tool results
  • One-off clarifications
  • Working notes

When should we summarize?

  • When prompt size grows too large
  • When earlier details matter only broadly
  • When moving between workflow stages

When should we retrieve?

  • At the start of a new session
  • Before taking an action
  • When user asks something linked to past context

When should we forget?

  • Information is outdated
  • User requests deletion
  • Memory is irrelevant
  • Memory is low confidence

6. Hands-on Exercise 1: Build Short-Term Memory with Conversation History

Objective

Create a simple chatbot that keeps recent conversation turns in memory and sends them to the OpenAI Responses API so the model can answer in context.

What learners will practice

  • Installing and using the OpenAI Python SDK
  • Structuring chat input for the Responses API
  • Maintaining a rolling conversation history
  • Limiting memory size

Step 1: Install dependencies

pip install openai python-dotenv

Step 2: Set your API key

Create a .env file:

OPENAI_API_KEY=your_api_key_here

Step 3: Python script for short-term memory

"""
short_term_memory_chat.py

A simple conversational chatbot that keeps short-term memory by storing
recent conversation turns and sending them to the OpenAI Responses API.

Requirements:
    pip install openai python-dotenv

Environment:
    OPENAI_API_KEY must be set, e.g. in a .env file.
"""

from openai import OpenAI
from dotenv import load_dotenv
import os


# Load environment variables from .env
load_dotenv()

# Create the OpenAI client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])


def build_input_messages(system_prompt: str, history: list[dict]) -> list[dict]:
    """
    Convert internal message history into the Responses API input format.

    Parameters:
        system_prompt: High-level instructions for the assistant.
        history: List of messages like:
                 {"role": "user"|"assistant", "content": "..."}

    Returns:
        A list of input items compatible with the Responses API.
    """
    input_items = [
        {
            "role": "system",
            "content": [{"type": "input_text", "text": system_prompt}],
        }
    ]

    for msg in history:
        input_items.append(
            {
                "role": msg["role"],
                "content": [{"type": "input_text", "text": msg["content"]}],
            }
        )

    return input_items


def get_assistant_reply(system_prompt: str, history: list[dict]) -> str:
    """
    Send the conversation history to the model and return the assistant's reply.
    """
    response = client.responses.create(
        model="gpt-5.4-mini",
        input=build_input_messages(system_prompt, history),
    )

    return response.output_text


def main() -> None:
    """
    Run a terminal chat app with rolling short-term memory.
    """
    print("Short-Term Memory Chatbot")
    print("Type 'exit' to quit.\n")

    system_prompt = (
        "You are a helpful AI assistant. Keep responses clear and friendly. "
        "Use the conversation history to maintain continuity."
    )

    # Internal conversation history
    history: list[dict] = []

    # Keep only the last 6 messages (3 user-assistant turns)
    max_messages = 6

    while True:
        user_input = input("You: ").strip()

        if user_input.lower() in {"exit", "quit"}:
            print("Goodbye!")
            break

        # Add the user's new message to memory
        history.append({"role": "user", "content": user_input})

        # Trim history to a rolling window
        history = history[-max_messages:]

        # Get assistant response
        assistant_reply = get_assistant_reply(system_prompt, history)

        print(f"Assistant: {assistant_reply}\n")

        # Add assistant response to memory
        history.append({"role": "assistant", "content": assistant_reply})

        # Trim again after assistant message
        history = history[-max_messages:]


if __name__ == "__main__":
    main()

Example interaction

Short-Term Memory Chatbot
Type 'exit' to quit.

You: My name is Priya and I am learning Python.
Assistant: Nice to meet you, Priya! Python is a great language to learn. What are you focusing on right now?

You: I also like concise explanations.
Assistant: Got it — I’ll keep things concise. What Python topic would you like help with?

You: What do you know about me?
Assistant: You told me your name is Priya, you’re learning Python, and you prefer concise explanations.

Discussion

This chatbot has short-term memory because:

  • It stores recent messages in history
  • It sends that history with each model call
  • The model can answer using prior context

Limitation

If the program exits, memory is lost. That means this is session memory, not persistent memory.


Mini Exercise

Modify the code so that:

  1. It remembers the last 10 messages instead of 6
  2. The assistant responds in bullet points if the user requests “summarize”
  3. A welcome message explains that memory is temporary

7. Hands-on Exercise 2: Build Long-Term Memory with a Local JSON Store

Objective

Create a simple persistent memory system that stores user preferences in a JSON file and injects them into future prompts.

What learners will practice

  • Reading and writing JSON in Python
  • Persisting memory across runs
  • Loading relevant memory before model calls
  • Using stored memory for personalization

Design

We will store memory like this:

{
  "name": "Priya",
  "preferences": {
    "explanation_style": "concise",
    "favorite_language": "Python",
    "diet": "vegetarian"
  }
}

This is a lightweight long-term memory system.


Step 1: Python script for long-term memory

"""
long_term_memory_chat.py

A simple chatbot that stores long-term user memory in a JSON file
and uses that memory in future conversations.

Requirements:
    pip install openai python-dotenv

Environment:
    OPENAI_API_KEY must be set.
"""

from openai import OpenAI
from dotenv import load_dotenv
import json
import os
from pathlib import Path


# Load environment variables from .env
load_dotenv()

# Create OpenAI client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# File used for persistent memory
MEMORY_FILE = Path("user_memory.json")


def load_memory() -> dict:
    """
    Load persistent memory from disk.

    Returns:
        A dictionary with stored user memory.
    """
    if MEMORY_FILE.exists():
        with MEMORY_FILE.open("r", encoding="utf-8") as f:
            return json.load(f)

    # Default empty memory structure
    return {
        "name": None,
        "preferences": {}
    }


def save_memory(memory: dict) -> None:
    """
    Save persistent memory to disk.
    """
    with MEMORY_FILE.open("w", encoding="utf-8") as f:
        json.dump(memory, f, indent=2, ensure_ascii=False)


def update_memory_from_user_input(memory: dict, user_input: str) -> dict:
    """
    Very simple rule-based memory extraction.

    This is intentionally lightweight for learning purposes.
    In real systems, extraction may be model-assisted or schema-driven.
    """
    text = user_input.lower().strip()

    # Learn the user's name from patterns like "my name is ..."
    if "my name is " in text:
        name = user_input.strip()[text.index("my name is ") + len("my name is "):].strip(" .!")
        if name:
            memory["name"] = name

    # Learn preference for concise explanations
    if "concise" in text:
        memory["preferences"]["explanation_style"] = "concise"

    # Learn preference for detailed explanations
    if "detailed" in text:
        memory["preferences"]["explanation_style"] = "detailed"

    # Learn favorite language
    if "i like python" in text or "favorite language is python" in text:
        memory["preferences"]["favorite_language"] = "Python"

    # Learn vegetarian preference
    if "i am vegetarian" in text or "i'm vegetarian" in text:
        memory["preferences"]["diet"] = "vegetarian"

    return memory


def memory_to_context(memory: dict) -> str:
    """
    Convert stored memory into a textual context block for the model.
    """
    lines = ["Known long-term memory about the user:"]

    if memory.get("name"):
        lines.append(f"- Name: {memory['name']}")

    preferences = memory.get("preferences", {})
    if preferences:
        for key, value in preferences.items():
            lines.append(f"- {key}: {value}")

    if len(lines) == 1:
        lines.append("- No long-term memory stored yet.")

    return "\n".join(lines)


def build_input(system_prompt: str, memory_context: str, user_input: str) -> list[dict]:
    """
    Build the input payload for the Responses API.
    """
    return [
        {
            "role": "system",
            "content": [{"type": "input_text", "text": system_prompt}],
        },
        {
            "role": "system",
            "content": [{"type": "input_text", "text": memory_context}],
        },
        {
            "role": "user",
            "content": [{"type": "input_text", "text": user_input}],
        },
    ]


def get_reply(system_prompt: str, memory: dict, user_input: str) -> str:
    """
    Get an assistant reply using stored long-term memory.
    """
    response = client.responses.create(
        model="gpt-5.4-mini",
        input=build_input(system_prompt, memory_to_context(memory), user_input),
    )
    return response.output_text


def main() -> None:
    """
    Run the chatbot with persistent long-term memory.
    """
    print("Long-Term Memory Chatbot")
    print("Type 'exit' to quit.")
    print("Stored memory is saved in user_memory.json\n")

    system_prompt = (
        "You are a helpful AI assistant. Personalize responses when appropriate "
        "using the stored user memory, but do not invent facts."
    )

    memory = load_memory()

    while True:
        user_input = input("You: ").strip()

        if user_input.lower() in {"exit", "quit"}:
            print("Goodbye!")
            break

        # Update and persist memory before generating a response
        memory = update_memory_from_user_input(memory, user_input)
        save_memory(memory)

        assistant_reply = get_reply(system_prompt, memory, user_input)
        print(f"Assistant: {assistant_reply}\n")


if __name__ == "__main__":
    main()

Example interaction: first run

Long-Term Memory Chatbot
Type 'exit' to quit.
Stored memory is saved in user_memory.json

You: My name is Priya.
Assistant: Nice to meet you, Priya! How can I help today?

You: I like concise explanations.
Assistant: Understood — I’ll keep my answers concise.

You: I am vegetarian.
Assistant: Thanks for letting me know. I’ll keep that in mind for food-related suggestions.

Example user_memory.json

{
  "name": "Priya",
  "preferences": {
    "explanation_style": "concise",
    "diet": "vegetarian"
  }
}

Example interaction: second run

Long-Term Memory Chatbot
Type 'exit' to quit.
Stored memory is saved in user_memory.json

You: Can you suggest a quick dinner?
Assistant: Since you’re vegetarian, a quick option could be veggie stir-fry, lentil soup, or a chickpea wrap.

Discussion

This is long-term memory because:

  • Data is saved to disk
  • It survives application restarts
  • Future prompts include remembered facts

Limitation

This system uses simple keyword rules. Real systems often use:

  • Structured extraction with models
  • Review or approval steps
  • Retrieval layers
  • Memory confidence scores
  • Deletion and correction flows

8. Guided Reflection: Short-Term vs Long-Term Memory

Feature Short-Term Memory Long-Term Memory
Scope Current session Across sessions
Storage In app state / prompt File, DB, vector store, etc.
Lifetime Temporary Persistent
Typical content Recent turns, active task state Preferences, facts, decisions
Cost concern Prompt/token growth Retrieval/storage complexity
Risk Losing context if truncated Stale or incorrect memory

Rule of thumb

  • Use short-term memory for what the model needs right now.
  • Use long-term memory for what the system should remember later.

9. Best Practices and Pitfalls

Best Practices

  • Keep memory minimal and relevant
  • Store facts explicitly when possible
  • Summarize older context
  • Let users correct memory
  • Separate temporary state from durable facts
  • Add timestamps or metadata in real systems
  • Use retrieval instead of dumping all memory into prompts

Pitfalls

  • Storing too much irrelevant data
  • Treating guesses as facts
  • Never forgetting stale information
  • Injecting all memory into every prompt
  • Storing sensitive information carelessly
  • Assuming model outputs are always reliable memory extracts

10. Suggested Extension Activities

If learners finish early, try one of these:

Extension 1: Add timestamps

Store when each memory item was added.

Extension 2: Add correction support

Let the user say: - “Forget my diet preference” - “My name is actually Anika”

Extension 3: Combine short-term and long-term memory

Use: - Rolling conversation history for immediate context - JSON memory for persistent preferences

Extension 4: Store memory entries as records

Instead of one flat JSON object, use:

[
  {
    "type": "preference",
    "key": "diet",
    "value": "vegetarian",
    "source": "user_message",
    "timestamp": "2026-03-22T10:00:00Z"
  }
]

This prepares learners for more realistic agent memory systems.


11. Knowledge Check

Quick Questions

  1. What is short-term memory in an AI system?
  2. Why is long-term memory usually stored outside the model?
  3. When should you summarize conversation history?
  4. Why is it risky to store every user statement as a permanent fact?
  5. What is the difference between retrieval and raw prompt accumulation?

Expected answers

  1. Recent contextual information used within the current interaction or session.
  2. Because persistence, retrieval, and control need to be managed by the application.
  3. When the context becomes too large or older details only need compact preservation.
  4. Some statements may be temporary, sensitive, incorrect, or irrelevant.
  5. Retrieval selects relevant stored memory, while raw accumulation keeps adding full context into the prompt.

12. Wrap-Up

In this session, learners explored:

  • Why memory is essential in AI systems
  • The distinction between short-term and long-term memory
  • Common memory patterns used in agentic applications
  • How to implement a short-term rolling history chatbot
  • How to persist simple user memory in a JSON store

These ideas are foundational for building useful AI assistants and agents that can maintain context, personalize behavior, and improve over time.


Useful Resources

  • OpenAI Responses API migration guide: https://developers.openai.com/api/docs/guides/migrate-to-responses
  • OpenAI API docs: https://developers.openai.com/api/
  • OpenAI Python SDK: https://github.com/openai/openai-python
  • Python json module docs: https://docs.python.org/3/library/json.html
  • Python pathlib docs: https://docs.python.org/3/library/pathlib.html
  • python-dotenv: https://pypi.org/project/python-dotenv/

Homework

Build a chatbot that combines both memory types:

  • Short-term: keep the last 4 conversation turns
  • Long-term: store user preferences in JSON
  • Before each model call:
  • include the recent conversation
  • include the relevant long-term memory
  • Add one command:
  • forget diet

Stretch goal

Implement a memory summary that compresses older chat history into 2–3 bullet points before trimming it.


Back to Chapter | Back to Master Plan | Next Session