Skip to content

Session 2: Understanding Large Language Models

Synopsis

Provides a high-level explanation of how LLMs are trained and how they generate outputs. Learners become familiar with concepts such as tokens, context windows, inference, reasoning patterns, and model limitations without diving into mathematical depth.

Session Content

Session 2: Understanding Large Language Models

Session Overview

In this session, learners will build a practical understanding of what Large Language Models (LLMs) are, how they work at a conceptual level, what they are good at, and where they fail. The goal is not deep mathematical theory, but a Python-developer-friendly mental model that helps learners use LLMs effectively in applications.

By the end of this session, learners will be able to:

  • Explain what an LLM is in simple technical terms
  • Describe tokens, context windows, and next-token prediction
  • Understand why prompting matters
  • Recognize common strengths and limitations of LLMs
  • Use the OpenAI Responses API with Python to interact with an LLM
  • Experiment with prompt design and compare outputs systematically

Session Duration

~45 minutes

Suggested breakdown:

  • 5 min: Introduction and motivation
  • 12 min: Core concepts of LLMs
  • 8 min: How LLMs generate text
  • 8 min: Strengths, limitations, and failure modes
  • 10 min: Hands-on exercises with Python and the Responses API
  • 2 min: Recap

1. Introduction: What Is a Large Language Model?

A Large Language Model is a machine learning model trained on vast amounts of text to predict and generate language.

At a practical level, an LLM:

  • Reads input text
  • Identifies patterns from training
  • Predicts what text should come next
  • Produces useful outputs such as explanations, summaries, code, classifications, or structured data

For developers, the most important thing to understand is this:

An LLM does not “think” like a human. It predicts likely continuations of text based on patterns it has learned.

This simple idea explains a lot of both its power and its limitations.

Why this matters for developers

If you are building applications with LLMs, you are not writing traditional deterministic logic. Instead, you are:

  • Providing instructions in natural language
  • Supplying context
  • Constraining outputs
  • Evaluating probabilistic behavior

This means application design shifts from only writing code to also designing prompts, context, evaluation, and guardrails.


2. Core Concepts

2.1 Tokens

LLMs do not process text as whole sentences. They process tokens, which are chunks of text.

A token may be:

  • A word
  • Part of a word
  • Punctuation
  • Whitespace-related chunks
  • Code fragments

For example, the sentence:

Python developers love clean code.

may be split into several tokens rather than four exact words.

Why tokens matter

Tokens affect:

  • Input size limits
  • Cost
  • Latency
  • How much context the model can consider at once

When building applications, you should think in terms of token budgets, not just characters or words.


2.2 Context Window

The context window is the amount of text the model can consider in a single request.

This includes:

  • Your instructions
  • User input
  • Retrieved documents
  • Conversation history
  • The model’s generated output

Why context matters

If the important information is missing, the model cannot use it.

If the context is noisy, conflicting, or too large, quality may drop.

Good LLM applications carefully choose what goes into the prompt.


2.3 Next-Token Prediction

At the heart of an LLM is a surprisingly simple training objective:

Given previous tokens, predict the next token.

For example:

  • Input: The capital of France is
  • Likely next token: Paris

By repeating this process token by token, the model can generate:

  • Answers
  • Explanations
  • Stories
  • Code
  • JSON
  • Summaries

Important mental model

LLMs do not retrieve “facts” the way a database does. They generate outputs based on learned statistical patterns.

That is why they can sound fluent even when they are wrong.


2.4 Parameters and “Large”

The word “large” in LLM refers to the scale of the model, especially the number of learned parameters.

You do not need to understand the full math to use LLMs effectively, but conceptually:

  • Parameters are the learned internal values of the model
  • More parameters generally allow richer pattern representation
  • Scale often improves capability, but not perfectly

Bigger models are often more capable, but they may also be:

  • More expensive
  • Slower
  • Still imperfect

3. How LLMs Generate Text

3.1 Prompt In, Tokens Out

When you send a prompt to an LLM, the model:

  1. Reads the input tokens
  2. Builds an internal representation of the context
  3. Predicts a likely next token
  4. Repeats until it decides the response is complete

This process is autoregressive generation.


3.2 Why Prompting Matters

Prompting matters because the model is highly sensitive to:

  • Wording
  • Specificity
  • Examples
  • Output format instructions
  • Role or task framing

Compare these prompts:

Weak prompt

Tell me about Python.

Better prompt

Explain Python to a beginner web developer in 5 bullet points.
Focus on syntax readability, ecosystem, and common use cases.

The second prompt gives the model stronger constraints, which usually leads to more useful output.


3.3 Deterministic vs Probabilistic Behavior

Traditional software often behaves deterministically:

  • Same input
  • Same logic
  • Same output

LLMs are probabilistic:

  • Same input may produce variation
  • Small wording changes may affect results
  • Output quality depends heavily on context and instruction design

This does not make them unreliable by default, but it means you must design for variability.


4. What LLMs Are Good At

LLMs are especially good at tasks involving language patterns, including:

  • Summarization
  • Rewriting
  • Classification
  • Information extraction
  • Code generation
  • Q&A over provided context
  • Brainstorming
  • Translation
  • Drafting structured text

Developer takeaway

Use LLMs when the task benefits from flexible language understanding or generation.

Examples:

  • Summarize a support ticket
  • Extract fields from user messages
  • Convert free text into structured JSON
  • Draft documentation from code comments
  • Explain an error message to a user

5. Limitations and Failure Modes

Understanding limitations is essential when building real systems.

5.1 Hallucinations

A hallucination occurs when the model generates plausible-sounding but false information.

Example:

  • Inventing a library function that does not exist
  • Making up a citation
  • Misstating a fact confidently

Why it happens

Because the model is optimizing for plausible text generation, not guaranteed truth.

Mitigation strategies

  • Provide trusted context
  • Ask for grounded answers
  • Use retrieval when facts matter
  • Validate outputs in code
  • Avoid blind trust

5.2 Prompt Sensitivity

Small prompt changes can produce different results.

This means:

  • Prompt design matters
  • Testing matters
  • Evaluation matters

A prompt that works once is not enough. You want prompts that work consistently across many cases.


5.3 Context Dependence

The model can only use what is in the current context plus what it learned during training.

If your application needs:

  • Fresh news
  • Private company data
  • Product catalogs
  • User-specific state

you must provide that information explicitly through system design.


5.4 Overconfidence

LLMs often respond fluently and confidently even when uncertain.

This is one reason why polished language should never be confused with correctness.


5.5 Non-Deterministic Output

Outputs may vary in:

  • Wording
  • Structure
  • Level of detail
  • Completeness

If you need stable downstream processing, constrain outputs clearly and validate them.


6. A Practical Mental Model for Python Developers

Think of an LLM as a powerful text transformation engine.

You provide:

  • Instructions
  • Context
  • Examples
  • Output constraints

The model returns:

  • Generated text
  • Structured responses
  • Candidate reasoning artifacts
  • Reformatted or extracted information

Analogy

A traditional function might look like this:

result = transform(data)

An LLM-powered function feels more like this:

result = transform_with_probabilistic_language_model(
    instructions="Summarize this bug report in 3 bullet points",
    context=data,
    constraints="Use plain English"
)

The interface is simple, but quality depends on prompt and context design.


7. Hands-On Setup

7.1 Install the OpenAI Python SDK

pip install openai

7.2 Set Your API Key

macOS/Linux

export OPENAI_API_KEY="your_api_key_here"

Windows PowerShell

setx OPENAI_API_KEY "your_api_key_here"

After setting it, restart your terminal if needed.


8. Hands-On Exercise 1: Your First LLM Call with the Responses API

Goal

Make a basic request to the gpt-5.4-mini model and print the response.

Code

"""
Exercise 1: Basic call to the OpenAI Responses API.

What this demonstrates:
- Creating an OpenAI client
- Sending a simple prompt to gpt-5.4-mini
- Reading the text output safely

Requirements:
- pip install openai
- Set OPENAI_API_KEY in your environment
"""

from openai import OpenAI


def main() -> None:
    # Create the API client. The SDK will automatically read OPENAI_API_KEY.
    client = OpenAI()

    # Send a simple prompt using the Responses API.
    response = client.responses.create(
        model="gpt-5.4-mini",
        input="In 3 short bullet points, explain what a large language model is."
    )

    # output_text is the easiest way to get the generated text from the response.
    print("Model response:\n")
    print(response.output_text)


if __name__ == "__main__":
    main()

Example output

Model response:

- A large language model is an AI system trained on huge amounts of text data.
- It predicts and generates text based on patterns learned during training.
- It can help with tasks like answering questions, summarizing, writing, and coding.

Discussion

Observe that:

  • The code is very small
  • The behavior is powerful but probabilistic
  • The quality depends on prompt wording

9. Hands-On Exercise 2: Compare Weak vs Strong Prompts

Goal

See how prompt quality affects output quality.

Code

"""
Exercise 2: Compare a weak prompt with a stronger, more constrained prompt.

This helps learners see that LLM output quality often improves when:
- The audience is specified
- The format is constrained
- The goal is clearly stated
"""

from openai import OpenAI


def get_response(client: OpenAI, prompt: str) -> str:
    """Send a prompt to the model and return plain text output."""
    response = client.responses.create(
        model="gpt-5.4-mini",
        input=prompt
    )
    return response.output_text


def main() -> None:
    client = OpenAI()

    weak_prompt = "Tell me about Python."

    strong_prompt = (
        "Explain Python to a beginner programmer in exactly 5 bullet points. "
        "Cover readability, common use cases, libraries, learning curve, "
        "and why it is popular."
    )

    print("=== Weak Prompt ===")
    print(weak_prompt)
    print()
    print(get_response(client, weak_prompt))
    print("\n" + "=" * 60 + "\n")

    print("=== Strong Prompt ===")
    print(strong_prompt)
    print()
    print(get_response(client, strong_prompt))


if __name__ == "__main__":
    main()

Example output

=== Weak Prompt ===
Tell me about Python.

Python is a high-level programming language known for its readability and simplicity. It is widely used in web development, data science, automation, artificial intelligence, and more. Python has a large community and a rich ecosystem of libraries.

============================================================

=== Strong Prompt ===
Explain Python to a beginner programmer in exactly 5 bullet points. Cover readability, common use cases, libraries, learning curve, and why it is popular.

- Python has a clean, readable syntax, which makes it easier for beginners to learn.
- It is used for many tasks, including web development, data analysis, automation, and AI.
- Python has a huge library ecosystem that helps developers build applications faster.
- Its learning curve is gentle compared to many other programming languages.
- Python is popular because it is versatile, productive, and supported by a large community.

Reflection questions

  • Which response is easier to use in an application?
  • Which prompt is more testable?
  • Which output would be easier to show directly in a UI?

10. Hands-On Exercise 3: Summarization and Controlled Output

Goal

Use an LLM for summarization and force a useful format.

Scenario

You have a long bug report and want a concise engineering summary.

Code

"""
Exercise 3: Summarize a bug report in a structured format.

This demonstrates:
- Practical use of LLMs for summarization
- How output constraints improve usefulness
- How to turn messy text into a cleaner artifact
"""

from openai import OpenAI


BUG_REPORT = """
Customer reports that the checkout page freezes after clicking 'Pay Now'.
The issue seems to happen mostly on mobile Safari, but one user also reported
it on Chrome for iPhone. The problem started after the release deployed on
Tuesday evening. Several users said the loading spinner keeps spinning and
the payment never completes, although in two cases the card was actually charged.
Support marked this as high priority because it affects purchases and creates
confusion about whether orders succeeded.
"""


def main() -> None:
    client = OpenAI()

    prompt = f"""
You are helping an engineering team triage a bug report.

Summarize the following bug report using this exact format:

Summary:
<one sentence>

Impact:
<one sentence>

Suspected clues:
- <bullet>
- <bullet>
- <bullet>

Bug report:
{BUG_REPORT}
"""

    response = client.responses.create(
        model="gpt-5.4-mini",
        input=prompt
    )

    print(response.output_text)


if __name__ == "__main__":
    main()

Example output

Summary:
The checkout page may freeze after users tap "Pay Now," especially on mobile Safari, preventing clear payment completion.

Impact:
This is a high-priority revenue-impacting issue because users may be charged without receiving confirmation that their order succeeded.

Suspected clues:
- The issue began after the Tuesday evening release.
- Mobile Safari appears to be the most frequently affected platform.
- In some cases, payment processing may succeed even though the UI remains stuck on a spinner.

Discussion

This is a strong use case for LLMs:

  • Input is unstructured
  • Desired output is concise
  • Format matters
  • Perfect factual precision is less critical than useful synthesis

11. Hands-On Exercise 4: Observe Variability and Prompt Sensitivity

Goal

Explore how small input changes can change results.

Code

"""
Exercise 4: Compare outputs across similar prompts.

This exercise helps learners understand:
- Prompt sensitivity
- Non-deterministic behavior
- Why evaluation matters
"""

from openai import OpenAI


PROMPTS = [
    "Explain what an API is.",
    "Explain what an API is to a 12-year-old.",
    "Explain what an API is in one sentence.",
    "Explain what an API is using a restaurant analogy.",
]


def main() -> None:
    client = OpenAI()

    for index, prompt in enumerate(PROMPTS, start=1):
        response = client.responses.create(
            model="gpt-5.4-mini",
            input=prompt
        )

        print(f"--- Prompt {index} ---")
        print(prompt)
        print()
        print(response.output_text)
        print("\n" + "-" * 60 + "\n")


if __name__ == "__main__":
    main()

Example output

--- Prompt 1 ---
Explain what an API is.

An API is a way for different software systems to communicate and share functionality with each other.

------------------------------------------------------------

--- Prompt 2 ---
Explain what an API is to a 12-year-old.

An API is like a messenger that helps two apps talk to each other without needing to know how everything works inside.

------------------------------------------------------------

--- Prompt 3 ---
Explain what an API is in one sentence.

An API is a defined way for software applications to communicate with each other.

------------------------------------------------------------

--- Prompt 4 ---
Explain what an API is using a restaurant analogy.

An API is like a waiter in a restaurant: you place an order, the waiter carries it to the kitchen, and then brings back your food without you needing to know how the kitchen works.

Reflection

This illustrates an important lesson:

Prompting is part of programming when building LLM applications.


12. Guided Discussion: When Should You Use an LLM?

Ask learners to classify these tasks as either:

  • Good LLM use case
  • Possible but needs validation
  • Better solved with traditional code

Task list

  1. Convert a support email into a short summary
  2. Compute tax from fixed business rules
  3. Extract product names from messy customer text
  4. Generate SQL directly against production without checks
  5. Rephrase documentation for beginners
  6. Validate whether a UUID matches a required format

Suggested answers

  • Good LLM use case:
  • Convert a support email into a short summary
  • Rephrase documentation for beginners

  • Possible but needs validation:

  • Extract product names from messy customer text
  • Generate SQL directly against production without checks

  • Better solved with traditional code:

  • Compute tax from fixed business rules
  • Validate whether a UUID matches a required format

13. Best Practices Introduced in This Session

As you start working with LLMs, keep these habits:

  • Be explicit in prompts
  • Specify audience, format, and constraints
  • Prefer structured outputs when possible
  • Keep important context close to the request
  • Do not trust fluent output blindly
  • Validate outputs when correctness matters
  • Test prompts across multiple inputs, not just one example

14. Common Misconceptions

Misconception 1: “The model understands exactly like a human.”

Not quite. It is better to think of it as pattern-based language generation with powerful emergent capabilities.

Misconception 2: “If the answer sounds confident, it is probably correct.”

Confidence in wording is not evidence of factual accuracy.

Misconception 3: “A single good demo means the prompt is production-ready.”

Production use requires repeatability, evaluation, and safeguards.

Misconception 4: “LLMs replace all traditional programming.”

They complement traditional programming. In many systems, the best design combines deterministic code with LLM-based language capabilities.


15. Mini Quiz

1. What is the core training objective of an LLM?

Answer: Predict the next token based on previous tokens.

2. Why do tokens matter?

Answer: They affect context limits, cost, and how much text the model can process.

3. What is a hallucination?

Answer: A plausible-sounding but false or invented output from the model.

4. Why does prompt design matter?

Answer: Because LLMs are sensitive to wording, structure, and constraints.

5. Name one task that is often better handled by traditional code.

Answer: Deterministic validation, such as checking whether a UUID matches a format.


16. Recap

In this session, learners explored:

  • What an LLM is
  • Tokens and context windows
  • Next-token prediction
  • Why prompting matters
  • Strengths and limitations of LLMs
  • How to call gpt-5.4-mini using the OpenAI Responses API
  • How better prompts lead to better outputs

The key practical takeaway is:

LLMs are powerful probabilistic tools for language tasks, but they must be guided with clear prompts, good context, and careful validation.


17. Useful Resources

  • OpenAI Responses API guide: https://developers.openai.com/api/docs/guides/migrate-to-responses
  • OpenAI API docs overview: https://developers.openai.com/api/
  • OpenAI Python SDK: https://github.com/openai/openai-python
  • Prompting guide: https://platform.openai.com/docs/guides/prompt-engineering

18. Suggested Homework

  1. Rewrite one weak prompt into three stronger versions and compare outputs.
  2. Build a small Python script that:
  3. Reads a paragraph from a file
  4. Sends it to gpt-5.4-mini
  5. Returns:
    • a one-sentence summary
    • three bullet points
  6. Collect three examples where the model gives a strong answer and three where it gives a weak or misleading answer.

19. Instructor Notes

Key teaching emphasis

Focus on giving learners a practical mental model, not a research-level treatment.

Watch for confusion around

  • “The model knows everything”
  • “The model is always factual”
  • “Prompting is just wording, not engineering”

Good discussion prompt

Ask learners:

If a model gives beautiful output that is factually wrong, is the problem the model, the prompt, the system design, or all three?

This sets up future sessions on evaluation, grounding, and agentic workflows.


Back to Chapter | Back to Master Plan | Previous Session | Next Session