Skip to content

Session 3: Designing Structured Outputs

Synopsis

Shows how to request predictable responses such as JSON, tabular data, or schema-aligned outputs. Learners begin designing machine-readable LLM responses suitable for downstream automation and agent workflows.

Session Content

Session 3: Designing Structured Outputs

Session Overview

In this session, learners will move from “free-form text generation” to reliable, machine-usable outputs. This is a foundational skill for building production-grade GenAI systems and agentic workflows. Instead of asking a model to “just answer,” we will design prompts and response formats so the model returns data that can be parsed, validated, and used directly in Python applications.

By the end of this session, learners will be able to:

  • Explain why structured outputs matter in real applications
  • Design prompts that encourage predictable JSON responses
  • Use the OpenAI Responses API with gpt-5.4-mini to generate structured data
  • Validate and parse model outputs safely in Python
  • Build small workflows that depend on structured model responses
  • Recognize common failure modes and improve prompt reliability

Learning Objectives

After this session, you should be able to:

  1. Define structured outputs and explain their value in software systems
  2. Compare unstructured natural language responses vs. schema-like JSON responses
  3. Write prompts that specify output shape, constraints, and field expectations
  4. Parse model-generated JSON safely in Python
  5. Validate structured outputs with lightweight Python checks
  6. Design robust fallback strategies when model output is malformed or incomplete

Session Agenda (~45 minutes)

  • 0–5 min: Introduction to structured outputs
  • 5–15 min: Theory: why structure matters and common design patterns
  • 15–25 min: Prompt design for structured JSON outputs
  • 25–35 min: Hands-on Exercise 1: Generate structured product summaries
  • 35–42 min: Hands-on Exercise 2: Build a simple support ticket classifier
  • 42–45 min: Wrap-up and recap

1. Why Structured Outputs Matter

Large language models are very good at generating human-readable text. But software systems often need responses in a format that Python code can consume automatically.

Example: Unstructured output

If you ask:

“Summarize this customer review and tell me the sentiment.”

You might get:

“The customer liked the battery life but was disappointed by the slow charging speed. Overall sentiment is mixed.”

This is readable, but hard to process programmatically.

Example: Structured output

A better response for an application might be:

{
  "summary": "Customer liked battery life but disliked slow charging.",
  "sentiment": "mixed",
  "priority": "medium"
}

This can be:

  • parsed into a Python dictionary
  • stored in a database
  • used in a dashboard
  • fed into another system or agent

Benefits of structured outputs

Structured outputs enable:

  • Reliability: predictable fields and formats
  • Automation: direct use in downstream code
  • Validation: check presence and types of fields
  • Interoperability: integrate with APIs, UIs, and workflows
  • Agentic orchestration: one model step can produce inputs for the next

2. Common Structured Output Patterns

There are several useful output patterns when working with LLMs.

2.1 Flat JSON objects

Best for small tasks like classification, extraction, scoring, and summaries.

Example:

{
  "category": "billing",
  "urgency": "high",
  "requires_human": true
}

2.2 Lists of objects

Useful when extracting multiple items from text.

Example:

{
  "action_items": [
    {
      "task": "Email client",
      "owner": "Ava",
      "deadline": "2026-03-25"
    },
    {
      "task": "Prepare budget draft",
      "owner": "Ravi",
      "deadline": "2026-03-27"
    }
  ]
}

2.3 Nested JSON

Useful for richer responses, but should be used carefully. The more complex the output, the more likely formatting issues become.

Example:

{
  "document": {
    "title": "Q1 Planning Notes",
    "summary": "Budget and hiring priorities were discussed."
  },
  "risks": [
    "Hiring delays",
    "Vendor cost increases"
  ]
}

2.4 Enumerated values

If a field should come from a limited set, explicitly state allowed values.

Example:

  • sentiment: positive, negative, mixed, neutral
  • urgency: low, medium, high

This improves consistency and simplifies validation.


3. Principles for Designing Good Structured Prompts

When asking for structured outputs, vague prompts produce vague data. Good prompts clearly define:

  • the expected format
  • the required fields
  • field meanings
  • allowed values
  • rules for missing information

3.1 Be explicit about the format

Bad:

“Return the result in a structured way.”

Better:

“Return valid JSON with exactly these keys: summary, sentiment, confidence.”

3.2 Define constraints

Example:

  • sentiment must be one of: positive, negative, mixed, neutral
  • confidence must be a float from 0 to 1
  • summary must be at most 30 words

3.3 Tell the model what to do when information is missing

Example:

“If the text does not contain enough information, use null for missing values.”

This is much better than allowing hallucinated values.

3.4 Avoid unnecessary complexity

Start with the smallest useful schema. Overly complex nested structures are harder to generate and validate.

3.5 Ask for JSON only

Example:

“Return only valid JSON. Do not include markdown, explanation, or surrounding text.”

This reduces parsing problems.


4. Example Prompt Template for Structured Outputs

A reusable template:

You are a data extraction assistant.

Analyze the input text and return only valid JSON.

Required schema:
{
  "field_1": "string",
  "field_2": "string",
  "field_3": "number | null"
}

Rules:
- Return only valid JSON
- Do not include markdown or commentary
- If a value is unknown, use null
- field_2 must be one of: "low", "medium", "high"
- field_1 should be concise and under 20 words

This kind of prompt works well for many extraction and classification tasks.


5. Using the OpenAI Responses API for Structured Output

We will use:

  • Python
  • OpenAI Python SDK
  • gpt-5.4-mini
  • Responses API

5.1 Install dependencies

pip install openai

5.2 Set your API key

export OPENAI_API_KEY="your_api_key_here"

On Windows PowerShell:

$env:OPENAI_API_KEY="your_api_key_here"

6. Hands-on Exercise 1: Product Review Summarizer with JSON Output

Goal

Convert free-form customer reviews into structured JSON containing:

  • short summary
  • sentiment
  • confidence
  • key issues list

What you will learn

  • How to write a structured-output prompt
  • How to call the Responses API
  • How to parse JSON safely
  • How to validate basic fields

Step 1: Create the Python script

import json
import os
from openai import OpenAI

# Initialize the OpenAI client using the API key from the environment.
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Example product review text to analyze.
review_text = """
I really like how long the battery lasts on this headset.
The sound quality is clear and comfortable for long meetings.
However, charging is much slower than I expected, and the case feels a bit flimsy.
"""

# Prompt instructing the model to return structured JSON only.
prompt = f"""
You are a product review analysis assistant.

Analyze the following customer review and return only valid JSON.

Required schema:
{{
  "summary": "string",
  "sentiment": "positive | negative | mixed | neutral",
  "confidence": "number between 0 and 1",
  "key_issues": ["string"]
}}

Rules:
- Return only valid JSON
- Do not include markdown or explanation
- summary must be at most 25 words
- sentiment must be one of: positive, negative, mixed, neutral
- confidence must be a number between 0 and 1
- key_issues should be an array of short strings
- If there are no issues, return an empty array

Customer review:
\"\"\"
{review_text}
\"\"\"
"""

# Send the request using the Responses API.
response = client.responses.create(
    model="gpt-5.4-mini",
    input=prompt
)

# The output_text property provides the model's text response.
raw_output = response.output_text

print("Raw model output:")
print(raw_output)

# Parse the JSON response safely.
try:
    parsed = json.loads(raw_output)
except json.JSONDecodeError as exc:
    print("\nFailed to parse JSON:", exc)
    raise

# Lightweight validation checks to ensure expected structure.
required_keys = {"summary", "sentiment", "confidence", "key_issues"}
missing_keys = required_keys - parsed.keys()

if missing_keys:
    raise ValueError(f"Missing required keys: {missing_keys}")

if parsed["sentiment"] not in {"positive", "negative", "mixed", "neutral"}:
    raise ValueError("Invalid sentiment value")

if not isinstance(parsed["confidence"], (int, float)):
    raise ValueError("Confidence must be numeric")

if not (0 <= parsed["confidence"] <= 1):
    raise ValueError("Confidence must be between 0 and 1")

if not isinstance(parsed["key_issues"], list):
    raise ValueError("key_issues must be a list")

print("\nValidated structured output:")
print(json.dumps(parsed, indent=2))

Example output

{
  "summary": "Customer praised battery and comfort but criticized slow charging and case quality.",
  "sentiment": "mixed",
  "confidence": 0.94,
  "key_issues": [
    "slow charging",
    "flimsy case"
  ]
}

Discussion

Notice how this output is much more useful than plain prose:

  • summary can be shown in a UI
  • sentiment can power analytics
  • confidence can support filtering or escalation
  • key_issues can be aggregated across many reviews

Mini Challenge

Modify the script to process three reviews in a loop and collect all parsed results into a Python list.

Suggested review samples:

reviews = [
    "The laptop is fast and lightweight, but the fan gets loud under load.",
    "Excellent camera quality and battery. Very happy with this phone.",
    "The app crashes often and syncing is unreliable. Frustrating experience."
]

7. Designing Safer Parsers and Validators

Even with a strong prompt, model output may occasionally be malformed or incomplete. In production, always validate.

  1. Parse with json.loads
  2. Check required keys
  3. Check data types
  4. Check allowed enum values
  5. Check length or range constraints
  6. Handle failure gracefully

Common fallback strategies

  • retry with a stricter prompt
  • log invalid output for review
  • return a safe default
  • ask the model to fix invalid JSON in a second pass

8. Hands-on Exercise 2: Support Ticket Classifier

Goal

Build a small classifier that converts incoming support text into structured fields:

  • category
  • urgency
  • requires_human
  • short_reason

This is a common real-world GenAI pattern.


Step 1: Create the classifier script

import json
import os
from openai import OpenAI

# Initialize the client.
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Example support ticket submitted by a user.
ticket_text = """
I was charged twice for my subscription this month.
I already contacted support yesterday and have not received a refund.
Please resolve this as soon as possible.
"""

# Prompt specifying an exact schema and constraints.
prompt = f"""
You are a support ticket triage assistant.

Read the ticket and return only valid JSON.

Required schema:
{{
  "category": "billing | technical | account | shipping | other",
  "urgency": "low | medium | high",
  "requires_human": true,
  "short_reason": "string"
}}

Rules:
- Return only valid JSON
- Do not include markdown or explanation
- category must be one of: billing, technical, account, shipping, other
- urgency must be one of: low, medium, high
- requires_human must be a boolean
- short_reason must be at most 20 words

Ticket:
\"\"\"
{ticket_text}
\"\"\"
"""

# Call the Responses API.
response = client.responses.create(
    model="gpt-5.4-mini",
    input=prompt
)

raw_output = response.output_text

print("Raw model output:")
print(raw_output)

# Parse and validate the JSON output.
parsed = json.loads(raw_output)

allowed_categories = {"billing", "technical", "account", "shipping", "other"}
allowed_urgency = {"low", "medium", "high"}

if parsed["category"] not in allowed_categories:
    raise ValueError("Invalid category")

if parsed["urgency"] not in allowed_urgency:
    raise ValueError("Invalid urgency")

if not isinstance(parsed["requires_human"], bool):
    raise ValueError("requires_human must be boolean")

if not isinstance(parsed["short_reason"], str):
    raise ValueError("short_reason must be a string")

print("\nValidated ticket classification:")
print(json.dumps(parsed, indent=2))

Example output

{
  "category": "billing",
  "urgency": "high",
  "requires_human": true,
  "short_reason": "Duplicate charge and refund request need human follow-up."
}

Step 2: Extend to multiple tickets

You can process a batch of tickets in a loop.

import json
import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

tickets = [
    "I can't log into my account even after resetting my password twice.",
    "My order was supposed to arrive last week and still hasn't been delivered.",
    "The mobile app freezes whenever I try to upload a profile picture."
]

for i, ticket in enumerate(tickets, start=1):
    prompt = f"""
You are a support ticket triage assistant.

Read the ticket and return only valid JSON.

Required schema:
{{
  "category": "billing | technical | account | shipping | other",
  "urgency": "low | medium | high",
  "requires_human": true,
  "short_reason": "string"
}}

Rules:
- Return only valid JSON
- Do not include markdown or explanation
- category must be one of: billing, technical, account, shipping, other
- urgency must be one of: low, medium, high
- requires_human must be a boolean
- short_reason must be at most 20 words

Ticket:
\"\"\"
{ticket}
\"\"\"
"""

    response = client.responses.create(
        model="gpt-5.4-mini",
        input=prompt
    )

    raw_output = response.output_text
    parsed = json.loads(raw_output)

    print(f"\nTicket #{i}")
    print(json.dumps(parsed, indent=2))

Example output

Ticket #1
{
  "category": "account",
  "urgency": "high",
  "requires_human": true,
  "short_reason": "User cannot access account after password resets."
}

Ticket #2
{
  "category": "shipping",
  "urgency": "medium",
  "requires_human": true,
  "short_reason": "Order delivery is overdue."
}

Ticket #3
{
  "category": "technical",
  "urgency": "medium",
  "requires_human": true,
  "short_reason": "App freezes during profile image upload."
}

9. Prompt Design Checklist for Structured Outputs

Use this checklist whenever you need machine-readable results.

Checklist

  • Specify the output format explicitly
  • Ask for JSON only
  • Define required keys
  • Define allowed values for enum-like fields
  • Define numeric ranges when needed
  • Specify behavior for missing data
  • Keep schemas as simple as possible
  • Validate in Python before using the result downstream

10. Common Mistakes

Mistake 1: Asking for “structured data” without defining structure

Too vague:

“Give me the result in a structured format.”

Better:

“Return valid JSON with keys category, urgency, and reason.”

Mistake 2: Not validating output

Even if the model usually behaves well, your application should not assume correctness.

Mistake 3: Overly large schemas

If you ask for 20 nested fields, reliability may decrease. Start small.

Mistake 4: No handling for missing information

Always define whether the model should use null, empty strings, or empty arrays.

Mistake 5: Mixing explanation with data

If you want parseable JSON, ask for JSON only.


11. From Structured Outputs to Agentic Workflows

Structured outputs are a key building block of agentic systems.

For example:

  1. Step 1: classify user input into a category
  2. Step 2: decide whether to use a tool
  3. Step 3: extract parameters for the tool
  4. Step 4: generate the final user-facing response

Each step depends on predictable outputs from the previous step. Without structure, chaining these steps becomes fragile.

Example workflow

A customer message:

“My package is late and I need it before Friday.”

Structured output from the model:

{
  "category": "shipping",
  "urgency": "high",
  "requires_tracking_lookup": true,
  "customer_deadline": "Friday"
}

Your Python code can now:

  • route to shipping support
  • trigger a tracking API lookup
  • prioritize the request

This is why structured outputs are central to practical AI systems.


12. Wrap-Up

In this session, you learned how to:

  • design prompts for structured outputs
  • request JSON-only responses
  • parse model output in Python
  • validate fields and constraints
  • use structured outputs in simple application workflows

This is one of the most important patterns in applied GenAI development. Once outputs become predictable, they can power reliable software systems rather than just chatbot-style interactions.


13. Practice Tasks

Try these after the session:

  1. Build a meeting note extractor that returns:
  2. summary
  3. decisions
  4. action items
  5. owners

  6. Build a resume screener that returns:

  7. candidate_name
  8. years_experience
  9. key_skills
  10. fit_score

  11. Build a content moderation helper that returns:

  12. risk_level
  13. categories
  14. recommended_action

For each project: - define a schema - prompt for JSON only - parse and validate in Python - test with multiple examples


Useful Resources

  • OpenAI Responses API migration guide: https://developers.openai.com/api/docs/guides/migrate-to-responses
  • OpenAI API docs: https://platform.openai.com/docs
  • OpenAI Python SDK: https://github.com/openai/openai-python
  • Python json module documentation: https://docs.python.org/3/library/json.html

Quick Recap

  • Structured outputs make LLM responses usable by software
  • JSON is a practical default format
  • Good prompts specify schema, rules, and constraints
  • Python validation is essential
  • Structured outputs are a foundation for agentic applications

End of Session


Back to Chapter | Back to Master Plan | Previous Session | Next Session