Session 2: Understanding Large Language Models
Synopsis
Provides a high-level explanation of how LLMs are trained and how they generate outputs. Learners become familiar with concepts such as tokens, context windows, inference, reasoning patterns, and model limitations without diving into mathematical depth.
Session Content
Session 2: Understanding Large Language Models
Session Overview
In this session, learners will build a practical understanding of what Large Language Models (LLMs) are, how they work at a conceptual level, what they are good at, and where they fail. The goal is not deep mathematical theory, but a Python-developer-friendly mental model that helps learners use LLMs effectively in applications.
By the end of this session, learners will be able to:
- Explain what an LLM is in simple technical terms
- Describe tokens, context windows, and next-token prediction
- Understand why prompting matters
- Recognize common strengths and limitations of LLMs
- Use the OpenAI Responses API with Python to interact with an LLM
- Experiment with prompt design and compare outputs systematically
Session Duration
~45 minutes
Suggested breakdown:
- 5 min: Introduction and motivation
- 12 min: Core concepts of LLMs
- 8 min: How LLMs generate text
- 8 min: Strengths, limitations, and failure modes
- 10 min: Hands-on exercises with Python and the Responses API
- 2 min: Recap
1. Introduction: What Is a Large Language Model?
A Large Language Model is a machine learning model trained on vast amounts of text to predict and generate language.
At a practical level, an LLM:
- Reads input text
- Identifies patterns from training
- Predicts what text should come next
- Produces useful outputs such as explanations, summaries, code, classifications, or structured data
For developers, the most important thing to understand is this:
An LLM does not “think” like a human. It predicts likely continuations of text based on patterns it has learned.
This simple idea explains a lot of both its power and its limitations.
Why this matters for developers
If you are building applications with LLMs, you are not writing traditional deterministic logic. Instead, you are:
- Providing instructions in natural language
- Supplying context
- Constraining outputs
- Evaluating probabilistic behavior
This means application design shifts from only writing code to also designing prompts, context, evaluation, and guardrails.
2. Core Concepts
2.1 Tokens
LLMs do not process text as whole sentences. They process tokens, which are chunks of text.
A token may be:
- A word
- Part of a word
- Punctuation
- Whitespace-related chunks
- Code fragments
For example, the sentence:
Python developers love clean code.
may be split into several tokens rather than four exact words.
Why tokens matter
Tokens affect:
- Input size limits
- Cost
- Latency
- How much context the model can consider at once
When building applications, you should think in terms of token budgets, not just characters or words.
2.2 Context Window
The context window is the amount of text the model can consider in a single request.
This includes:
- Your instructions
- User input
- Retrieved documents
- Conversation history
- The model’s generated output
Why context matters
If the important information is missing, the model cannot use it.
If the context is noisy, conflicting, or too large, quality may drop.
Good LLM applications carefully choose what goes into the prompt.
2.3 Next-Token Prediction
At the heart of an LLM is a surprisingly simple training objective:
Given previous tokens, predict the next token.
For example:
- Input:
The capital of France is - Likely next token:
Paris
By repeating this process token by token, the model can generate:
- Answers
- Explanations
- Stories
- Code
- JSON
- Summaries
Important mental model
LLMs do not retrieve “facts” the way a database does. They generate outputs based on learned statistical patterns.
That is why they can sound fluent even when they are wrong.
2.4 Parameters and “Large”
The word “large” in LLM refers to the scale of the model, especially the number of learned parameters.
You do not need to understand the full math to use LLMs effectively, but conceptually:
- Parameters are the learned internal values of the model
- More parameters generally allow richer pattern representation
- Scale often improves capability, but not perfectly
Bigger models are often more capable, but they may also be:
- More expensive
- Slower
- Still imperfect
3. How LLMs Generate Text
3.1 Prompt In, Tokens Out
When you send a prompt to an LLM, the model:
- Reads the input tokens
- Builds an internal representation of the context
- Predicts a likely next token
- Repeats until it decides the response is complete
This process is autoregressive generation.
3.2 Why Prompting Matters
Prompting matters because the model is highly sensitive to:
- Wording
- Specificity
- Examples
- Output format instructions
- Role or task framing
Compare these prompts:
Weak prompt
Tell me about Python.
Better prompt
Explain Python to a beginner web developer in 5 bullet points.
Focus on syntax readability, ecosystem, and common use cases.
The second prompt gives the model stronger constraints, which usually leads to more useful output.
3.3 Deterministic vs Probabilistic Behavior
Traditional software often behaves deterministically:
- Same input
- Same logic
- Same output
LLMs are probabilistic:
- Same input may produce variation
- Small wording changes may affect results
- Output quality depends heavily on context and instruction design
This does not make them unreliable by default, but it means you must design for variability.
4. What LLMs Are Good At
LLMs are especially good at tasks involving language patterns, including:
- Summarization
- Rewriting
- Classification
- Information extraction
- Code generation
- Q&A over provided context
- Brainstorming
- Translation
- Drafting structured text
Developer takeaway
Use LLMs when the task benefits from flexible language understanding or generation.
Examples:
- Summarize a support ticket
- Extract fields from user messages
- Convert free text into structured JSON
- Draft documentation from code comments
- Explain an error message to a user
5. Limitations and Failure Modes
Understanding limitations is essential when building real systems.
5.1 Hallucinations
A hallucination occurs when the model generates plausible-sounding but false information.
Example:
- Inventing a library function that does not exist
- Making up a citation
- Misstating a fact confidently
Why it happens
Because the model is optimizing for plausible text generation, not guaranteed truth.
Mitigation strategies
- Provide trusted context
- Ask for grounded answers
- Use retrieval when facts matter
- Validate outputs in code
- Avoid blind trust
5.2 Prompt Sensitivity
Small prompt changes can produce different results.
This means:
- Prompt design matters
- Testing matters
- Evaluation matters
A prompt that works once is not enough. You want prompts that work consistently across many cases.
5.3 Context Dependence
The model can only use what is in the current context plus what it learned during training.
If your application needs:
- Fresh news
- Private company data
- Product catalogs
- User-specific state
you must provide that information explicitly through system design.
5.4 Overconfidence
LLMs often respond fluently and confidently even when uncertain.
This is one reason why polished language should never be confused with correctness.
5.5 Non-Deterministic Output
Outputs may vary in:
- Wording
- Structure
- Level of detail
- Completeness
If you need stable downstream processing, constrain outputs clearly and validate them.
6. A Practical Mental Model for Python Developers
Think of an LLM as a powerful text transformation engine.
You provide:
- Instructions
- Context
- Examples
- Output constraints
The model returns:
- Generated text
- Structured responses
- Candidate reasoning artifacts
- Reformatted or extracted information
Analogy
A traditional function might look like this:
result = transform(data)
An LLM-powered function feels more like this:
result = transform_with_probabilistic_language_model(
instructions="Summarize this bug report in 3 bullet points",
context=data,
constraints="Use plain English"
)
The interface is simple, but quality depends on prompt and context design.
7. Hands-On Setup
7.1 Install the OpenAI Python SDK
pip install openai
7.2 Set Your API Key
macOS/Linux
export OPENAI_API_KEY="your_api_key_here"
Windows PowerShell
setx OPENAI_API_KEY "your_api_key_here"
After setting it, restart your terminal if needed.
8. Hands-On Exercise 1: Your First LLM Call with the Responses API
Goal
Make a basic request to the gpt-5.4-mini model and print the response.
Code
"""
Exercise 1: Basic call to the OpenAI Responses API.
What this demonstrates:
- Creating an OpenAI client
- Sending a simple prompt to gpt-5.4-mini
- Reading the text output safely
Requirements:
- pip install openai
- Set OPENAI_API_KEY in your environment
"""
from openai import OpenAI
def main() -> None:
# Create the API client. The SDK will automatically read OPENAI_API_KEY.
client = OpenAI()
# Send a simple prompt using the Responses API.
response = client.responses.create(
model="gpt-5.4-mini",
input="In 3 short bullet points, explain what a large language model is."
)
# output_text is the easiest way to get the generated text from the response.
print("Model response:\n")
print(response.output_text)
if __name__ == "__main__":
main()
Example output
Model response:
- A large language model is an AI system trained on huge amounts of text data.
- It predicts and generates text based on patterns learned during training.
- It can help with tasks like answering questions, summarizing, writing, and coding.
Discussion
Observe that:
- The code is very small
- The behavior is powerful but probabilistic
- The quality depends on prompt wording
9. Hands-On Exercise 2: Compare Weak vs Strong Prompts
Goal
See how prompt quality affects output quality.
Code
"""
Exercise 2: Compare a weak prompt with a stronger, more constrained prompt.
This helps learners see that LLM output quality often improves when:
- The audience is specified
- The format is constrained
- The goal is clearly stated
"""
from openai import OpenAI
def get_response(client: OpenAI, prompt: str) -> str:
"""Send a prompt to the model and return plain text output."""
response = client.responses.create(
model="gpt-5.4-mini",
input=prompt
)
return response.output_text
def main() -> None:
client = OpenAI()
weak_prompt = "Tell me about Python."
strong_prompt = (
"Explain Python to a beginner programmer in exactly 5 bullet points. "
"Cover readability, common use cases, libraries, learning curve, "
"and why it is popular."
)
print("=== Weak Prompt ===")
print(weak_prompt)
print()
print(get_response(client, weak_prompt))
print("\n" + "=" * 60 + "\n")
print("=== Strong Prompt ===")
print(strong_prompt)
print()
print(get_response(client, strong_prompt))
if __name__ == "__main__":
main()
Example output
=== Weak Prompt ===
Tell me about Python.
Python is a high-level programming language known for its readability and simplicity. It is widely used in web development, data science, automation, artificial intelligence, and more. Python has a large community and a rich ecosystem of libraries.
============================================================
=== Strong Prompt ===
Explain Python to a beginner programmer in exactly 5 bullet points. Cover readability, common use cases, libraries, learning curve, and why it is popular.
- Python has a clean, readable syntax, which makes it easier for beginners to learn.
- It is used for many tasks, including web development, data analysis, automation, and AI.
- Python has a huge library ecosystem that helps developers build applications faster.
- Its learning curve is gentle compared to many other programming languages.
- Python is popular because it is versatile, productive, and supported by a large community.
Reflection questions
- Which response is easier to use in an application?
- Which prompt is more testable?
- Which output would be easier to show directly in a UI?
10. Hands-On Exercise 3: Summarization and Controlled Output
Goal
Use an LLM for summarization and force a useful format.
Scenario
You have a long bug report and want a concise engineering summary.
Code
"""
Exercise 3: Summarize a bug report in a structured format.
This demonstrates:
- Practical use of LLMs for summarization
- How output constraints improve usefulness
- How to turn messy text into a cleaner artifact
"""
from openai import OpenAI
BUG_REPORT = """
Customer reports that the checkout page freezes after clicking 'Pay Now'.
The issue seems to happen mostly on mobile Safari, but one user also reported
it on Chrome for iPhone. The problem started after the release deployed on
Tuesday evening. Several users said the loading spinner keeps spinning and
the payment never completes, although in two cases the card was actually charged.
Support marked this as high priority because it affects purchases and creates
confusion about whether orders succeeded.
"""
def main() -> None:
client = OpenAI()
prompt = f"""
You are helping an engineering team triage a bug report.
Summarize the following bug report using this exact format:
Summary:
<one sentence>
Impact:
<one sentence>
Suspected clues:
- <bullet>
- <bullet>
- <bullet>
Bug report:
{BUG_REPORT}
"""
response = client.responses.create(
model="gpt-5.4-mini",
input=prompt
)
print(response.output_text)
if __name__ == "__main__":
main()
Example output
Summary:
The checkout page may freeze after users tap "Pay Now," especially on mobile Safari, preventing clear payment completion.
Impact:
This is a high-priority revenue-impacting issue because users may be charged without receiving confirmation that their order succeeded.
Suspected clues:
- The issue began after the Tuesday evening release.
- Mobile Safari appears to be the most frequently affected platform.
- In some cases, payment processing may succeed even though the UI remains stuck on a spinner.
Discussion
This is a strong use case for LLMs:
- Input is unstructured
- Desired output is concise
- Format matters
- Perfect factual precision is less critical than useful synthesis
11. Hands-On Exercise 4: Observe Variability and Prompt Sensitivity
Goal
Explore how small input changes can change results.
Code
"""
Exercise 4: Compare outputs across similar prompts.
This exercise helps learners understand:
- Prompt sensitivity
- Non-deterministic behavior
- Why evaluation matters
"""
from openai import OpenAI
PROMPTS = [
"Explain what an API is.",
"Explain what an API is to a 12-year-old.",
"Explain what an API is in one sentence.",
"Explain what an API is using a restaurant analogy.",
]
def main() -> None:
client = OpenAI()
for index, prompt in enumerate(PROMPTS, start=1):
response = client.responses.create(
model="gpt-5.4-mini",
input=prompt
)
print(f"--- Prompt {index} ---")
print(prompt)
print()
print(response.output_text)
print("\n" + "-" * 60 + "\n")
if __name__ == "__main__":
main()
Example output
--- Prompt 1 ---
Explain what an API is.
An API is a way for different software systems to communicate and share functionality with each other.
------------------------------------------------------------
--- Prompt 2 ---
Explain what an API is to a 12-year-old.
An API is like a messenger that helps two apps talk to each other without needing to know how everything works inside.
------------------------------------------------------------
--- Prompt 3 ---
Explain what an API is in one sentence.
An API is a defined way for software applications to communicate with each other.
------------------------------------------------------------
--- Prompt 4 ---
Explain what an API is using a restaurant analogy.
An API is like a waiter in a restaurant: you place an order, the waiter carries it to the kitchen, and then brings back your food without you needing to know how the kitchen works.
Reflection
This illustrates an important lesson:
Prompting is part of programming when building LLM applications.
12. Guided Discussion: When Should You Use an LLM?
Ask learners to classify these tasks as either:
- Good LLM use case
- Possible but needs validation
- Better solved with traditional code
Task list
- Convert a support email into a short summary
- Compute tax from fixed business rules
- Extract product names from messy customer text
- Generate SQL directly against production without checks
- Rephrase documentation for beginners
- Validate whether a UUID matches a required format
Suggested answers
- Good LLM use case:
- Convert a support email into a short summary
-
Rephrase documentation for beginners
-
Possible but needs validation:
- Extract product names from messy customer text
-
Generate SQL directly against production without checks
-
Better solved with traditional code:
- Compute tax from fixed business rules
- Validate whether a UUID matches a required format
13. Best Practices Introduced in This Session
As you start working with LLMs, keep these habits:
- Be explicit in prompts
- Specify audience, format, and constraints
- Prefer structured outputs when possible
- Keep important context close to the request
- Do not trust fluent output blindly
- Validate outputs when correctness matters
- Test prompts across multiple inputs, not just one example
14. Common Misconceptions
Misconception 1: “The model understands exactly like a human.”
Not quite. It is better to think of it as pattern-based language generation with powerful emergent capabilities.
Misconception 2: “If the answer sounds confident, it is probably correct.”
Confidence in wording is not evidence of factual accuracy.
Misconception 3: “A single good demo means the prompt is production-ready.”
Production use requires repeatability, evaluation, and safeguards.
Misconception 4: “LLMs replace all traditional programming.”
They complement traditional programming. In many systems, the best design combines deterministic code with LLM-based language capabilities.
15. Mini Quiz
1. What is the core training objective of an LLM?
Answer: Predict the next token based on previous tokens.
2. Why do tokens matter?
Answer: They affect context limits, cost, and how much text the model can process.
3. What is a hallucination?
Answer: A plausible-sounding but false or invented output from the model.
4. Why does prompt design matter?
Answer: Because LLMs are sensitive to wording, structure, and constraints.
5. Name one task that is often better handled by traditional code.
Answer: Deterministic validation, such as checking whether a UUID matches a format.
16. Recap
In this session, learners explored:
- What an LLM is
- Tokens and context windows
- Next-token prediction
- Why prompting matters
- Strengths and limitations of LLMs
- How to call
gpt-5.4-miniusing the OpenAI Responses API - How better prompts lead to better outputs
The key practical takeaway is:
LLMs are powerful probabilistic tools for language tasks, but they must be guided with clear prompts, good context, and careful validation.
17. Useful Resources
- OpenAI Responses API guide: https://developers.openai.com/api/docs/guides/migrate-to-responses
- OpenAI API docs overview: https://developers.openai.com/api/
- OpenAI Python SDK: https://github.com/openai/openai-python
- Prompting guide: https://platform.openai.com/docs/guides/prompt-engineering
18. Suggested Homework
- Rewrite one weak prompt into three stronger versions and compare outputs.
- Build a small Python script that:
- Reads a paragraph from a file
- Sends it to
gpt-5.4-mini - Returns:
- a one-sentence summary
- three bullet points
- Collect three examples where the model gives a strong answer and three where it gives a weak or misleading answer.
19. Instructor Notes
Key teaching emphasis
Focus on giving learners a practical mental model, not a research-level treatment.
Watch for confusion around
- “The model knows everything”
- “The model is always factual”
- “Prompting is just wording, not engineering”
Good discussion prompt
Ask learners:
If a model gives beautiful output that is factually wrong, is the problem the model, the prompt, the system design, or all three?
This sets up future sessions on evaluation, grounding, and agentic workflows.
Back to Chapter | Back to Master Plan | Previous Session | Next Session