Session 1: What Tool Use Adds to LLM Systems
Synopsis
Introduces the concept of tool-augmented models and explains why function calling improves reliability for calculation, search, APIs, and system actions. Learners see how tool use extends the usefulness of language models beyond text generation.
Session Content
Session 1: What Tool Use Adds to LLM Systems
Session Overview
Duration: ~45 minutes
Audience: Python developers with basic programming knowledge who are beginning with GenAI and agentic development.
Learning Goals
By the end of this session, learners will be able to:
- Explain why plain LLM prompting is often not enough for real applications.
- Describe what “tool use” means in LLM systems.
- Identify the types of tasks that benefit from tool calling.
- Build a simple Python example using the OpenAI Responses API and
gpt-5.4-mini. - Implement a small tool-enabled workflow where the model decides when to call a Python function.
- Compare outputs from:
- direct LLM-only generation
- LLM + tool use
1. Why Plain LLMs Are Powerful, But Limited
Large language models are very good at:
- understanding natural language
- generating text
- summarizing information
- transforming content
- reasoning over provided context
However, an LLM by itself has important limitations in practical software systems.
Common limitations of LLM-only systems
1. No guaranteed access to live data
If you ask:
“What is the current stock price of a company?”
“What time is it in Tokyo right now?”
“What are today’s support tickets?”
A plain model cannot reliably know current or private information unless that data is supplied.
2. No direct execution of business logic
A model may describe how to calculate shipping or tax, but that is not the same as actually executing your business rules in code.
3. No built-in access to external systems
Real applications often need to interact with:
- databases
- APIs
- search systems
- calculators
- internal services
- file stores
- scheduling systems
Without tools, the model can only “talk about” these systems.
4. Hallucination risk
If the model lacks information, it may still generate a plausible answer. This is dangerous when correctness matters.
5. Weakness in deterministic operations
For tasks like:
- exact math
- data lookups
- filtering records
- date calculations
- structured transformations
traditional code is often more reliable than free-form generation.
2. What Tool Use Means
Tool use means the LLM can decide to call a function or external capability instead of answering from text generation alone.
A tool is usually:
- a Python function
- an API wrapper
- a database query function
- a search function
- a calculator
- a retrieval mechanism
- an internal service call
The model does not directly execute arbitrary code. Instead, it:
- receives a list of available tools
- decides whether a tool is needed
- returns a structured request to call that tool
- your application executes the tool
- the tool result is passed back to the model
- the model uses the result to produce a final answer
This pattern is central to agentic systems.
Simple mental model
Without tool use:
User asks a question → model answers from its internal reasoning and given context
With tool use:
User asks a question → model decides a tool is needed → app runs the tool → model uses real output to answer
3. Why Tool Use Matters in Real Applications
Tool use adds several important capabilities.
A. Access to fresh or private data
Examples:
- customer account lookup
- order status
- current weather
- internal product catalog
- calendar availability
B. Reliable execution of logic
Examples:
- tax calculation
- discount rules
- eligibility checks
- inventory status
- route optimization
C. Grounding model outputs in reality
Instead of guessing, the model can say:
- “I checked your order.”
- “I looked up the exchange rate.”
- “I queried your account balance.”
D. Better user experience
Users often want outcomes, not just explanations.
For example:
- not “Here’s how to send an email”
- but “I drafted and sent the email”
E. Foundation for agents
Many agentic systems are simply LLMs that can:
- plan
- choose tools
- inspect outputs
- continue toward a goal
Tool use is one of the first major steps from “chatbot” to “agent.”
4. When to Use a Tool vs Let the Model Answer Directly
A useful rule:
Let the model answer directly when:
- the task is mostly language-based
- creativity or summarization is needed
- approximate reasoning is acceptable
- all necessary context is already provided
Examples:
- rewrite an email politely
- summarize meeting notes
- generate onboarding copy
Use tools when:
- correctness matters
- current data matters
- private/internal data is needed
- deterministic computation is needed
- actions must be taken
Examples:
- calculate a payment total
- retrieve customer records
- check support ticket status
- query inventory
- send a notification
5. Core Architecture Pattern for Tool-Enabled Systems
A basic tool-calling loop usually looks like this:
- Define tools with clear names, descriptions, and parameters.
- Send user request + tool definitions to the model.
- Inspect model output:
- if it answers directly, return the response
- if it requests a tool call, execute the tool
- Send tool result back to the model
- Get final user-facing response
Important engineering idea
The model is responsible for: - deciding which tool to use - deciding when to use it - deciding how to explain the result
Your code is responsible for: - validating arguments - safely executing tools - handling errors - logging and observability - enforcing permissions and constraints
6. Hands-On Exercise 1: LLM-Only vs Tool-Enabled Answering
Goal
See the difference between:
- asking a question directly
- giving the model a calculator-like tool for exact computation
We will build a tiny deterministic tool in Python.
7. Setup
Install the OpenAI SDK if needed:
pip install openai
Set your API key:
export OPENAI_API_KEY="your_api_key_here"
On Windows PowerShell:
setx OPENAI_API_KEY "your_api_key_here"
8. Exercise 1A: Direct Model Response
Create a file named direct_answer.py.
"""
direct_answer.py
A simple example showing a direct prompt to the model without tools.
This is useful for understanding baseline model behavior.
Run:
python direct_answer.py
"""
from openai import OpenAI
def main() -> None:
# Initialize the OpenAI client.
client = OpenAI()
# Ask a question that involves arithmetic.
user_question = "If I buy 3 notebooks at $4.25 each and 2 pens at $1.50 each, what is the total cost?"
# Call the Responses API with a plain text prompt.
response = client.responses.create(
model="gpt-5.4-mini",
input=user_question,
)
# Print the final text output from the model.
print("User question:")
print(user_question)
print("\nModel answer:")
print(response.output_text)
if __name__ == "__main__":
main()
Example output
User question:
If I buy 3 notebooks at $4.25 each and 2 pens at $1.50 each, what is the total cost?
Model answer:
The total cost is $15.75.
Discussion
This may work well for simple arithmetic. But in production systems, exact calculations should often be delegated to code instead of relying on generated reasoning.
9. Exercise 1B: Tool-Enabled Exact Calculation
Now we give the model access to a tool.
Create a file named tool_answer.py.
"""
tool_answer.py
A simple tool-calling example using the OpenAI Responses API.
The model can choose to call a pricing calculator tool.
Run:
python tool_answer.py
"""
import json
from openai import OpenAI
def calculate_total(notebook_qty: int, notebook_price: float, pen_qty: int, pen_price: float) -> dict:
"""
Deterministically calculate a total purchase cost.
Returns a dictionary so the result is easy to serialize and pass back
to the model.
"""
total = (notebook_qty * notebook_price) + (pen_qty * pen_price)
return {
"notebook_qty": notebook_qty,
"notebook_price": notebook_price,
"pen_qty": pen_qty,
"pen_price": pen_price,
"total": round(total, 2),
"currency": "USD",
}
def main() -> None:
client = OpenAI()
user_question = "If I buy 3 notebooks at $4.25 each and 2 pens at $1.50 each, what is the total cost?"
# Define the tool schema the model can use.
tools = [
{
"type": "function",
"name": "calculate_total",
"description": "Calculate the total cost of notebooks and pens.",
"parameters": {
"type": "object",
"properties": {
"notebook_qty": {
"type": "integer",
"description": "Number of notebooks purchased."
},
"notebook_price": {
"type": "number",
"description": "Price per notebook in USD."
},
"pen_qty": {
"type": "integer",
"description": "Number of pens purchased."
},
"pen_price": {
"type": "number",
"description": "Price per pen in USD."
},
},
"required": ["notebook_qty", "notebook_price", "pen_qty", "pen_price"],
"additionalProperties": False,
},
}
]
# First call: let the model decide whether to use the tool.
response = client.responses.create(
model="gpt-5.4-mini",
input=user_question,
tools=tools,
)
# Look for function tool calls in the output.
function_calls = [item for item in response.output if item.type == "function_call"]
if not function_calls:
# If the model answers directly, print that answer.
print("Model answered without calling a tool:")
print(response.output_text)
return
# For this exercise, we expect one tool call.
function_call = function_calls[0]
# Parse the arguments chosen by the model.
arguments = json.loads(function_call.arguments)
# Execute the Python tool.
tool_result = calculate_total(**arguments)
# Send the tool result back to the model so it can produce a final answer.
final_response = client.responses.create(
model="gpt-5.4-mini",
previous_response_id=response.id,
input=[
{
"type": "function_call_output",
"call_id": function_call.call_id,
"output": json.dumps(tool_result),
}
],
)
print("User question:")
print(user_question)
print("\nTool arguments selected by model:")
print(json.dumps(arguments, indent=2))
print("\nTool result:")
print(json.dumps(tool_result, indent=2))
print("\nFinal model answer:")
print(final_response.output_text)
if __name__ == "__main__":
main()
Example output
User question:
If I buy 3 notebooks at $4.25 each and 2 pens at $1.50 each, what is the total cost?
Tool arguments selected by model:
{
"notebook_qty": 3,
"notebook_price": 4.25,
"pen_qty": 2,
"pen_price": 1.5
}
Tool result:
{
"notebook_qty": 3,
"notebook_price": 4.25,
"pen_qty": 2,
"pen_price": 1.5,
"total": 15.75,
"currency": "USD"
}
Final model answer:
The total cost is $15.75 USD.
10. What Just Happened?
In the tool-enabled version:
- the model recognized that calculation was needed
- it created structured arguments
- your application executed trusted Python code
- the tool result was returned
- the model turned the result into a user-friendly answer
This is the essential tool-calling loop.
Key takeaway
The LLM is not replacing your software logic.
It is orchestrating when that logic should be used.
11. Exercise 2: Build a Tiny “Order Lookup” Assistant
Goal
Create a tool that looks up order status from Python data. This simulates a real business system.
This exercise is more realistic than a calculator because it uses structured business data.
12. Exercise 2 Code
Create a file named order_lookup.py.
"""
order_lookup.py
A realistic beginner-friendly example of tool use:
the model can look up order information from a Python data source.
Run:
python order_lookup.py
"""
import json
from openai import OpenAI
# Simulated internal order database.
ORDERS = {
"A100": {"status": "shipped", "estimated_delivery": "2026-03-25", "item": "Wireless Mouse"},
"A101": {"status": "processing", "estimated_delivery": "2026-03-28", "item": "Mechanical Keyboard"},
"A102": {"status": "delivered", "estimated_delivery": "2026-03-20", "item": "USB-C Hub"},
}
def lookup_order(order_id: str) -> dict:
"""
Look up an order by ID.
In a real application, this might query a database or internal API.
"""
order = ORDERS.get(order_id.upper())
if order is None:
return {
"found": False,
"order_id": order_id,
"message": "No order was found for that ID.",
}
return {
"found": True,
"order_id": order_id.upper(),
"status": order["status"],
"estimated_delivery": order["estimated_delivery"],
"item": order["item"],
}
def main() -> None:
client = OpenAI()
user_question = "Can you check the status of order A101 for me?"
tools = [
{
"type": "function",
"name": "lookup_order",
"description": "Look up an order by its order ID.",
"parameters": {
"type": "object",
"properties": {
"order_id": {
"type": "string",
"description": "The customer's order ID, such as A100."
}
},
"required": ["order_id"],
"additionalProperties": False,
},
}
]
response = client.responses.create(
model="gpt-5.4-mini",
input=user_question,
tools=tools,
)
function_calls = [item for item in response.output if item.type == "function_call"]
if not function_calls:
print("Model answered without a tool:")
print(response.output_text)
return
function_call = function_calls[0]
arguments = json.loads(function_call.arguments)
tool_result = lookup_order(**arguments)
final_response = client.responses.create(
model="gpt-5.4-mini",
previous_response_id=response.id,
input=[
{
"type": "function_call_output",
"call_id": function_call.call_id,
"output": json.dumps(tool_result),
}
],
)
print("User question:")
print(user_question)
print("\nTool arguments:")
print(json.dumps(arguments, indent=2))
print("\nTool result:")
print(json.dumps(tool_result, indent=2))
print("\nFinal model answer:")
print(final_response.output_text)
if __name__ == "__main__":
main()
Example output
User question:
Can you check the status of order A101 for me?
Tool arguments:
{
"order_id": "A101"
}
Tool result:
{
"found": true,
"order_id": "A101",
"status": "processing",
"estimated_delivery": "2026-03-28",
"item": "Mechanical Keyboard"
}
Final model answer:
Your order A101 for the Mechanical Keyboard is currently processing, and the estimated delivery date is 2026-03-28.
13. Why This Example Matters
This is much closer to real-world agentic application design.
The model is useful because it can:
- understand natural language
- identify the order ID from the sentence
- decide that lookup is required
- explain results naturally
The tool is useful because it provides:
- authoritative internal data
- deterministic retrieval
- a boundary between language understanding and business data access
Together, they are more useful than either alone.
14. Design Principles for Good Tools
When building tools for LLM systems, follow these principles.
1. Keep tools narrow and clear
Bad:
- do_everything()
Better:
- lookup_order(order_id)
- calculate_shipping(weight, zip_code)
- search_docs(query)
2. Use explicit parameter schemas
Good schemas help the model call tools correctly.
3. Return structured data
Prefer JSON-like dictionaries over free-form strings.
Bad:
return "Looks like it shipped yesterday and may arrive soon."
Better:
return {
"status": "shipped",
"shipped_date": "2026-03-21",
"estimated_delivery": "2026-03-25"
}
4. Validate inputs
Never assume model-generated arguments are always correct.
5. Handle missing data safely
Tools should fail gracefully.
6. Separate decision-making from execution
The model decides what to do.
Your code decides what is allowed and how it runs.
15. Common Pitfalls
Pitfall 1: Over-trusting the model
The model can produce malformed or incomplete arguments.
Pitfall 2: Giving tools vague descriptions
If descriptions are unclear, tool choice may be poor.
Pitfall 3: Creating giant multi-purpose tools
Smaller tools are easier for the model to understand and use.
Pitfall 4: Letting the model perform exact logic in text
If correctness is important, use code.
Pitfall 5: Ignoring observability
You should log: - user input - tool calls - tool arguments - tool results - errors
This is essential for debugging.
16. Hands-On Challenge
Try extending order_lookup.py in the following ways:
Challenge A
Add more orders and test different phrasings: - “Where is order A100?” - “Has A102 been delivered?” - “Please check order A999.”
Challenge B
Add a second tool:
cancel_order(order_id: str) -> dict
Then ask: - “Cancel order A101.” - “Can you check A100 first, then cancel it if it hasn’t shipped?”
Challenge C
Modify the tool so it returns: - order status - item - customer name - shipping carrier
Observe how the model incorporates richer tool data into final answers.
17. Mini Debrief: What Tool Use Adds
Tool use adds the ability to connect language reasoning with external capabilities.
Without tools, LLM systems are mainly:
- conversational
- generative
- approximate
- context-bound
With tools, LLM systems become:
- grounded
- action-oriented
- data-aware
- operationally useful
That is why tool use is a major building block of agentic development.
18. Session Summary
In this session, you learned:
- plain LLMs are powerful, but limited for real applications
- tool use allows models to call external functions
- tools improve accuracy, freshness, reliability, and usefulness
- the Responses API supports a practical tool-calling workflow
- Python functions can serve as deterministic tools in an LLM system
You also built:
- a direct-answer example
- a calculator tool example
- an order lookup tool example
19. Useful Resources
-
OpenAI Responses API migration guide:
https://developers.openai.com/api/docs/guides/migrate-to-responses -
OpenAI API docs overview:
https://developers.openai.com/api/ -
OpenAI Python SDK:
https://github.com/openai/openai-python -
JSON Schema reference:
https://json-schema.org/understanding-json-schema/
20. Suggested Homework
Build a small assistant with two tools:
lookup_order(order_id)calculate_refund(order_total, restocking_fee_percent)
Test prompts like:
- “What’s the status of order A100?”
- “If I return a $120 item with a 10% restocking fee, how much do I get back?”
- “Check order A101 and summarize the result politely.”
Homework goal
Practice separating: - language understanding - deterministic logic - final response generation
This separation is one of the most important habits in building robust GenAI applications.