Session 1: What Tool Use Adds to LLM Systems

Synopsis

Introduces the concept of tool-augmented models and explains why function calling improves reliability for calculation, search, APIs, and system actions. Learners see how tool use extends the usefulness of language models beyond text generation.

Session Content

Session 1: What Tool Use Adds to LLM Systems

Session Overview

Duration: ~45 minutes
Audience: Python developers with basic programming knowledge who are beginning with GenAI and agentic development.

Learning Goals

By the end of this session, learners will be able to:

Explain why plain LLM prompting is often not enough for real applications.
Describe what “tool use” means in LLM systems.
Identify the types of tasks that benefit from tool calling.
Build a simple Python example using the OpenAI Responses API and gpt-5.4-mini.
Implement a small tool-enabled workflow where the model decides when to call a Python function.
Compare outputs from:
direct LLM-only generation
LLM + tool use

1. Why Plain LLMs Are Powerful, But Limited

Large language models are very good at:

understanding natural language
generating text
summarizing information
transforming content
reasoning over provided context

However, an LLM by itself has important limitations in practical software systems.

Common limitations of LLM-only systems

1. No guaranteed access to live data

If you ask:

“What is the current stock price of a company?”
“What time is it in Tokyo right now?”
“What are today’s support tickets?”

A plain model cannot reliably know current or private information unless that data is supplied.

2. No direct execution of business logic

A model may describe how to calculate shipping or tax, but that is not the same as actually executing your business rules in code.

3. No built-in access to external systems

Real applications often need to interact with:

databases
APIs
search systems
calculators
internal services
file stores
scheduling systems

Without tools, the model can only “talk about” these systems.

4. Hallucination risk

If the model lacks information, it may still generate a plausible answer. This is dangerous when correctness matters.

5. Weakness in deterministic operations

For tasks like:

exact math
data lookups
filtering records
date calculations
structured transformations

traditional code is often more reliable than free-form generation.

2. What Tool Use Means

Tool use means the LLM can decide to call a function or external capability instead of answering from text generation alone.

A tool is usually:

a Python function
an API wrapper
a database query function
a search function
a calculator
a retrieval mechanism
an internal service call

The model does not directly execute arbitrary code. Instead, it:

receives a list of available tools
decides whether a tool is needed
returns a structured request to call that tool
your application executes the tool
the tool result is passed back to the model
the model uses the result to produce a final answer

This pattern is central to agentic systems.

Simple mental model

Without tool use:

User asks a question → model answers from its internal reasoning and given context

With tool use:

User asks a question → model decides a tool is needed → app runs the tool → model uses real output to answer

3. Why Tool Use Matters in Real Applications

Tool use adds several important capabilities.

A. Access to fresh or private data

Examples:

customer account lookup
order status
current weather
internal product catalog
calendar availability

B. Reliable execution of logic

Examples:

tax calculation
discount rules
eligibility checks
inventory status
route optimization

C. Grounding model outputs in reality

Instead of guessing, the model can say:

“I checked your order.”
“I looked up the exchange rate.”
“I queried your account balance.”

D. Better user experience

Users often want outcomes, not just explanations.

For example:

not “Here’s how to send an email”
but “I drafted and sent the email”

E. Foundation for agents

Many agentic systems are simply LLMs that can:

plan
choose tools
inspect outputs
continue toward a goal

Tool use is one of the first major steps from “chatbot” to “agent.”

4. When to Use a Tool vs Let the Model Answer Directly

A useful rule:

Let the model answer directly when:

the task is mostly language-based
creativity or summarization is needed
approximate reasoning is acceptable
all necessary context is already provided

Examples:

rewrite an email politely
summarize meeting notes
generate onboarding copy

Use tools when:

correctness matters
current data matters
private/internal data is needed
deterministic computation is needed
actions must be taken

Examples:

calculate a payment total
retrieve customer records
check support ticket status
query inventory
send a notification

5. Core Architecture Pattern for Tool-Enabled Systems

A basic tool-calling loop usually looks like this:

Define tools with clear names, descriptions, and parameters.
Send user request + tool definitions to the model.
Inspect model output:
if it answers directly, return the response
if it requests a tool call, execute the tool
Send tool result back to the model
Get final user-facing response

Important engineering idea

The model is responsible for: - deciding which tool to use - deciding when to use it - deciding how to explain the result

Your code is responsible for: - validating arguments - safely executing tools - handling errors - logging and observability - enforcing permissions and constraints

6. Hands-On Exercise 1: LLM-Only vs Tool-Enabled Answering

Goal

See the difference between:

asking a question directly
giving the model a calculator-like tool for exact computation

We will build a tiny deterministic tool in Python.

7. Setup

Install the OpenAI SDK if needed:

pip install openai

Set your API key:

export OPENAI_API_KEY="your_api_key_here"

On Windows PowerShell:

setx OPENAI_API_KEY "your_api_key_here"

8. Exercise 1A: Direct Model Response

Create a file named direct_answer.py.

"""
direct_answer.py

A simple example showing a direct prompt to the model without tools.
This is useful for understanding baseline model behavior.

Run:
    python direct_answer.py
"""

from openai import OpenAI


def main() -> None:
    # Initialize the OpenAI client.
    client = OpenAI()

    # Ask a question that involves arithmetic.
    user_question = "If I buy 3 notebooks at $4.25 each and 2 pens at $1.50 each, what is the total cost?"

    # Call the Responses API with a plain text prompt.
    response = client.responses.create(
        model="gpt-5.4-mini",
        input=user_question,
    )

    # Print the final text output from the model.
    print("User question:")
    print(user_question)
    print("\nModel answer:")
    print(response.output_text)


if __name__ == "__main__":
    main()

Example output

User question:
If I buy 3 notebooks at $4.25 each and 2 pens at $1.50 each, what is the total cost?

Model answer:
The total cost is $15.75.

Discussion

This may work well for simple arithmetic. But in production systems, exact calculations should often be delegated to code instead of relying on generated reasoning.

9. Exercise 1B: Tool-Enabled Exact Calculation

Now we give the model access to a tool.

Create a file named tool_answer.py.

"""
tool_answer.py

A simple tool-calling example using the OpenAI Responses API.
The model can choose to call a pricing calculator tool.

Run:
    python tool_answer.py
"""

import json
from openai import OpenAI


def calculate_total(notebook_qty: int, notebook_price: float, pen_qty: int, pen_price: float) -> dict:
    """
    Deterministically calculate a total purchase cost.

    Returns a dictionary so the result is easy to serialize and pass back
    to the model.
    """
    total = (notebook_qty * notebook_price) + (pen_qty * pen_price)
    return {
        "notebook_qty": notebook_qty,
        "notebook_price": notebook_price,
        "pen_qty": pen_qty,
        "pen_price": pen_price,
        "total": round(total, 2),
        "currency": "USD",
    }


def main() -> None:
    client = OpenAI()

    user_question = "If I buy 3 notebooks at $4.25 each and 2 pens at $1.50 each, what is the total cost?"

    # Define the tool schema the model can use.
    tools = [
        {
            "type": "function",
            "name": "calculate_total",
            "description": "Calculate the total cost of notebooks and pens.",
            "parameters": {
                "type": "object",
                "properties": {
                    "notebook_qty": {
                        "type": "integer",
                        "description": "Number of notebooks purchased."
                    },
                    "notebook_price": {
                        "type": "number",
                        "description": "Price per notebook in USD."
                    },
                    "pen_qty": {
                        "type": "integer",
                        "description": "Number of pens purchased."
                    },
                    "pen_price": {
                        "type": "number",
                        "description": "Price per pen in USD."
                    },
                },
                "required": ["notebook_qty", "notebook_price", "pen_qty", "pen_price"],
                "additionalProperties": False,
            },
        }
    ]

    # First call: let the model decide whether to use the tool.
    response = client.responses.create(
        model="gpt-5.4-mini",
        input=user_question,
        tools=tools,
    )

    # Look for function tool calls in the output.
    function_calls = [item for item in response.output if item.type == "function_call"]

    if not function_calls:
        # If the model answers directly, print that answer.
        print("Model answered without calling a tool:")
        print(response.output_text)
        return

    # For this exercise, we expect one tool call.
    function_call = function_calls[0]

    # Parse the arguments chosen by the model.
    arguments = json.loads(function_call.arguments)

    # Execute the Python tool.
    tool_result = calculate_total(**arguments)

    # Send the tool result back to the model so it can produce a final answer.
    final_response = client.responses.create(
        model="gpt-5.4-mini",
        previous_response_id=response.id,
        input=[
            {
                "type": "function_call_output",
                "call_id": function_call.call_id,
                "output": json.dumps(tool_result),
            }
        ],
    )

    print("User question:")
    print(user_question)
    print("\nTool arguments selected by model:")
    print(json.dumps(arguments, indent=2))
    print("\nTool result:")
    print(json.dumps(tool_result, indent=2))
    print("\nFinal model answer:")
    print(final_response.output_text)


if __name__ == "__main__":
    main()

Example output

User question:
If I buy 3 notebooks at $4.25 each and 2 pens at $1.50 each, what is the total cost?

Tool arguments selected by model:
{
  "notebook_qty": 3,
  "notebook_price": 4.25,
  "pen_qty": 2,
  "pen_price": 1.5
}

Tool result:
{
  "notebook_qty": 3,
  "notebook_price": 4.25,
  "pen_qty": 2,
  "pen_price": 1.5,
  "total": 15.75,
  "currency": "USD"
}

Final model answer:
The total cost is $15.75 USD.

10. What Just Happened?

In the tool-enabled version:

the model recognized that calculation was needed
it created structured arguments
your application executed trusted Python code
the tool result was returned
the model turned the result into a user-friendly answer

This is the essential tool-calling loop.

Key takeaway

The LLM is not replacing your software logic.
It is orchestrating when that logic should be used.

11. Exercise 2: Build a Tiny “Order Lookup” Assistant

Goal

Create a tool that looks up order status from Python data. This simulates a real business system.

This exercise is more realistic than a calculator because it uses structured business data.

12. Exercise 2 Code

Create a file named order_lookup.py.

"""
order_lookup.py

A realistic beginner-friendly example of tool use:
the model can look up order information from a Python data source.

Run:
    python order_lookup.py
"""

import json
from openai import OpenAI


# Simulated internal order database.
ORDERS = {
    "A100": {"status": "shipped", "estimated_delivery": "2026-03-25", "item": "Wireless Mouse"},
    "A101": {"status": "processing", "estimated_delivery": "2026-03-28", "item": "Mechanical Keyboard"},
    "A102": {"status": "delivered", "estimated_delivery": "2026-03-20", "item": "USB-C Hub"},
}


def lookup_order(order_id: str) -> dict:
    """
    Look up an order by ID.

    In a real application, this might query a database or internal API.
    """
    order = ORDERS.get(order_id.upper())

    if order is None:
        return {
            "found": False,
            "order_id": order_id,
            "message": "No order was found for that ID.",
        }

    return {
        "found": True,
        "order_id": order_id.upper(),
        "status": order["status"],
        "estimated_delivery": order["estimated_delivery"],
        "item": order["item"],
    }


def main() -> None:
    client = OpenAI()

    user_question = "Can you check the status of order A101 for me?"

    tools = [
        {
            "type": "function",
            "name": "lookup_order",
            "description": "Look up an order by its order ID.",
            "parameters": {
                "type": "object",
                "properties": {
                    "order_id": {
                        "type": "string",
                        "description": "The customer's order ID, such as A100."
                    }
                },
                "required": ["order_id"],
                "additionalProperties": False,
            },
        }
    ]

    response = client.responses.create(
        model="gpt-5.4-mini",
        input=user_question,
        tools=tools,
    )

    function_calls = [item for item in response.output if item.type == "function_call"]

    if not function_calls:
        print("Model answered without a tool:")
        print(response.output_text)
        return

    function_call = function_calls[0]
    arguments = json.loads(function_call.arguments)

    tool_result = lookup_order(**arguments)

    final_response = client.responses.create(
        model="gpt-5.4-mini",
        previous_response_id=response.id,
        input=[
            {
                "type": "function_call_output",
                "call_id": function_call.call_id,
                "output": json.dumps(tool_result),
            }
        ],
    )

    print("User question:")
    print(user_question)
    print("\nTool arguments:")
    print(json.dumps(arguments, indent=2))
    print("\nTool result:")
    print(json.dumps(tool_result, indent=2))
    print("\nFinal model answer:")
    print(final_response.output_text)


if __name__ == "__main__":
    main()

Example output

User question:
Can you check the status of order A101 for me?

Tool arguments:
{
  "order_id": "A101"
}

Tool result:
{
  "found": true,
  "order_id": "A101",
  "status": "processing",
  "estimated_delivery": "2026-03-28",
  "item": "Mechanical Keyboard"
}

Final model answer:
Your order A101 for the Mechanical Keyboard is currently processing, and the estimated delivery date is 2026-03-28.

13. Why This Example Matters

This is much closer to real-world agentic application design.

The model is useful because it can:

understand natural language
identify the order ID from the sentence
decide that lookup is required
explain results naturally

The tool is useful because it provides:

authoritative internal data
deterministic retrieval
a boundary between language understanding and business data access

Together, they are more useful than either alone.

14. Design Principles for Good Tools

When building tools for LLM systems, follow these principles.

1. Keep tools narrow and clear

Bad: - do_everything()

Better: - lookup_order(order_id) - calculate_shipping(weight, zip_code) - search_docs(query)

2. Use explicit parameter schemas

Good schemas help the model call tools correctly.

3. Return structured data

Prefer JSON-like dictionaries over free-form strings.

Bad:

return "Looks like it shipped yesterday and may arrive soon."

Better:

return {
    "status": "shipped",
    "shipped_date": "2026-03-21",
    "estimated_delivery": "2026-03-25"
}

4. Validate inputs

Never assume model-generated arguments are always correct.

5. Handle missing data safely

Tools should fail gracefully.

6. Separate decision-making from execution

The model decides what to do.
Your code decides what is allowed and how it runs.

15. Common Pitfalls

Pitfall 1: Over-trusting the model

The model can produce malformed or incomplete arguments.

Pitfall 2: Giving tools vague descriptions

If descriptions are unclear, tool choice may be poor.

Pitfall 3: Creating giant multi-purpose tools

Smaller tools are easier for the model to understand and use.

Pitfall 4: Letting the model perform exact logic in text

If correctness is important, use code.

Pitfall 5: Ignoring observability

You should log: - user input - tool calls - tool arguments - tool results - errors

This is essential for debugging.

16. Hands-On Challenge

Try extending order_lookup.py in the following ways:

Challenge A

Add more orders and test different phrasings: - “Where is order A100?” - “Has A102 been delivered?” - “Please check order A999.”

Challenge B

Add a second tool:

cancel_order(order_id: str) -> dict

Then ask: - “Cancel order A101.” - “Can you check A100 first, then cancel it if it hasn’t shipped?”

Challenge C

Modify the tool so it returns: - order status - item - customer name - shipping carrier

Observe how the model incorporates richer tool data into final answers.

17. Mini Debrief: What Tool Use Adds

Tool use adds the ability to connect language reasoning with external capabilities.

Without tools, LLM systems are mainly:

conversational
generative
approximate
context-bound

With tools, LLM systems become:

grounded
action-oriented
data-aware
operationally useful

That is why tool use is a major building block of agentic development.

18. Session Summary

In this session, you learned:

plain LLMs are powerful, but limited for real applications
tool use allows models to call external functions
tools improve accuracy, freshness, reliability, and usefulness
the Responses API supports a practical tool-calling workflow
Python functions can serve as deterministic tools in an LLM system

You also built:

a direct-answer example
a calculator tool example
an order lookup tool example

19. Useful Resources

OpenAI Responses API migration guide:
https://developers.openai.com/api/docs/guides/migrate-to-responses
OpenAI API docs overview:
https://developers.openai.com/api/
OpenAI Python SDK:
https://github.com/openai/openai-python
JSON Schema reference:
https://json-schema.org/understanding-json-schema/

20. Suggested Homework

Build a small assistant with two tools:

lookup_order(order_id)
calculate_refund(order_total, restocking_fee_percent)

Test prompts like:

“What’s the status of order A100?”
“If I return a $120 item with a 10% restocking fee, how much do I get back?”
“Check order A101 and summarize the result politely.”

Homework goal

Practice separating: - language understanding - deterministic logic - final response generation

This separation is one of the most important habits in building robust GenAI applications.

Back to Chapter | Back to Master Plan | Next Session