Session 4: Controlling Agent Actions Safely
Synopsis
Introduces approval steps, permission scopes, execution limits, and auditability for tool-enabled systems. This session emphasizes that action-oriented systems require stronger safeguards than read-only assistants.
Session Content
Session 4: Controlling Agent Actions Safely
Session Overview
In this session, learners will explore how to safely control the actions of LLM-powered agents. The focus is on reducing unintended behavior, validating model outputs before execution, restricting tool access, and introducing human-in-the-loop approval patterns. By the end of the session, learners will be able to design safer agent workflows in Python using the OpenAI Responses API with gpt-5.4-mini.
Duration
~45 minutes
Learning Objectives
By the end of this session, learners should be able to:
- Explain why agent safety matters in practical applications
- Identify common failure modes in agent action execution
- Validate and constrain model outputs before taking actions
- Implement allowlists and guardrails for tool usage
- Add approval checkpoints before sensitive operations
- Build a small safe-action agent loop in Python
1. Why Safe Action Control Matters
Agents are powerful because they do more than generate text. They can:
- call tools
- write files
- send emails
- query databases
- trigger workflows
- modify system state
This introduces risk. A normal chatbot can say something incorrect. An agent can do something incorrect.
Common Risks
- Hallucinated actions: the model invents a command or parameter
- Unsafe tool usage: the model tries to call a tool it should not access
- Prompt injection: external content tries to manipulate the agent
- Over-broad execution: the model performs more actions than required
- Sensitive operations without approval: deleting data, sending messages, making purchases
- Poor parameter quality: malformed JSON, wrong IDs, invalid file paths
Core Safety Principle
A useful rule for agent systems:
The model may propose actions, but the application must decide whether to execute them.
This means:
- the model is not the final authority
- every action should be checked
- tools should be tightly scoped
- sensitive tasks should require explicit approval
2. Safety Patterns for Agentic Systems
2.1 Constrain the Action Space
Do not let the model execute arbitrary code or commands.
Prefer:
- a small set of explicit tools
- clear tool schemas
- strict argument validation
- business rules in application code
Avoid:
- shell execution from raw model text
- dynamic
eval - unrestricted file access
- database access without query checks
2.2 Validate Before Execute
Before executing a model-requested action, verify:
- the tool name is allowed
- required fields are present
- field types are correct
- parameter values are within acceptable ranges
- action is appropriate for the current user/session
- action is not sensitive without approval
2.3 Human-in-the-Loop for Sensitive Actions
Some actions should never happen automatically, such as:
- deleting records
- sending email to real users
- issuing refunds
- modifying permissions
- writing to production systems
Pattern:
- model proposes action
- application validates it
- application asks human for approval
- only then execute
2.4 Separate Planning from Execution
A safer architecture often uses two stages:
- Planning: the model suggests what should happen
- Execution: your application checks and performs allowed steps
This reduces the chance of the model directly controlling side effects.
2.5 Log Every Action Attempt
Track:
- user request
- model response
- requested tool/action
- validation decision
- execution outcome
- approval status
This is essential for debugging and auditing.
3. Safe Agent Design in Python
We will simulate a small task-management agent with these allowed tools:
list_taskscreate_taskmark_done
We will intentionally not allow:
delete_all_tasks- arbitrary Python execution
- filesystem writes
- shell commands
We will implement:
- a tool allowlist
- parameter validation
- approval checks for sensitive actions
- a safe execution loop
4. Environment Setup
Install the OpenAI SDK:
pip install openai
Set your API key:
export OPENAI_API_KEY="your_api_key_here"
On Windows PowerShell:
setx OPENAI_API_KEY "your_api_key_here"
5. Core Example: A Safe Action Controller
This first example shows a complete safe-action pattern:
- the model is asked to produce a structured tool request
- the application parses and validates it
- only approved, valid actions are executed
Example: Safe Agent for Task Operations
"""
Session 4 - Controlling Agent Actions Safely
This example demonstrates a safe agent loop:
1. Ask the model to propose an action in JSON format
2. Parse the JSON safely
3. Validate tool name and arguments
4. Require approval for sensitive operations
5. Execute only approved, allowed actions
"""
import json
import os
from typing import Any, Dict, Optional
from openai import OpenAI
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
# -----------------------------------------------------------------------------
# In-memory task store for demonstration
# -----------------------------------------------------------------------------
TASKS = [
{"id": 1, "title": "Review PR", "done": False},
{"id": 2, "title": "Write documentation", "done": False},
]
# -----------------------------------------------------------------------------
# Allowed tools and safety policy
# -----------------------------------------------------------------------------
ALLOWED_TOOLS = {"list_tasks", "create_task", "mark_done"}
# Operations that should require explicit human approval
SENSITIVE_TOOLS = {"mark_done"} # mark_done could be considered state-changing
def list_tasks() -> Dict[str, Any]:
"""Return all tasks."""
return {"status": "success", "tasks": TASKS}
def create_task(title: str) -> Dict[str, Any]:
"""Create a new task after validation."""
new_id = max(task["id"] for task in TASKS) + 1 if TASKS else 1
task = {"id": new_id, "title": title, "done": False}
TASKS.append(task)
return {"status": "success", "task": task}
def mark_done(task_id: int) -> Dict[str, Any]:
"""Mark a task as completed."""
for task in TASKS:
if task["id"] == task_id:
task["done"] = True
return {"status": "success", "task": task}
return {"status": "error", "message": f"Task {task_id} not found"}
def ask_model_for_action(user_request: str) -> str:
"""
Ask the model to suggest one tool call as JSON only.
We instruct the model to return a strict JSON object with:
- tool: string
- arguments: object
"""
prompt = f"""
You are a task assistant.
A user will make a request. Choose exactly one tool from this set:
- list_tasks
- create_task
- mark_done
Return JSON only with this schema:
{{
"tool": "<tool_name>",
"arguments": {{}}
}}
Rules:
- Do not include markdown
- Do not include explanations
- Use only the allowed tool names
- If the request is unclear, choose list_tasks with empty arguments
User request: {user_request}
""".strip()
response = client.responses.create(
model="gpt-5.4-mini",
input=prompt
)
return response.output_text.strip()
def parse_action(raw_text: str) -> Optional[Dict[str, Any]]:
"""Parse the model's JSON action safely."""
try:
action = json.loads(raw_text)
if not isinstance(action, dict):
return None
return action
except json.JSONDecodeError:
return None
def validate_action(action: Dict[str, Any]) -> Dict[str, Any]:
"""
Validate the proposed action against application safety rules.
Returns a structured validation result.
"""
tool = action.get("tool")
arguments = action.get("arguments", {})
if tool not in ALLOWED_TOOLS:
return {
"ok": False,
"reason": f"Tool '{tool}' is not allowed."
}
if not isinstance(arguments, dict):
return {
"ok": False,
"reason": "Arguments must be a JSON object."
}
if tool == "list_tasks":
return {"ok": True, "tool": tool, "arguments": arguments}
if tool == "create_task":
title = arguments.get("title")
if not isinstance(title, str) or not title.strip():
return {
"ok": False,
"reason": "create_task requires a non-empty string 'title'."
}
if len(title) > 100:
return {
"ok": False,
"reason": "Task title must be 100 characters or fewer."
}
return {"ok": True, "tool": tool, "arguments": {"title": title.strip()}}
if tool == "mark_done":
task_id = arguments.get("task_id")
if not isinstance(task_id, int):
return {
"ok": False,
"reason": "mark_done requires integer 'task_id'."
}
return {"ok": True, "tool": tool, "arguments": {"task_id": task_id}}
return {"ok": False, "reason": "Unexpected validation path."}
def requires_approval(tool: str) -> bool:
"""Return True if this tool requires human approval."""
return tool in SENSITIVE_TOOLS
def get_human_approval(tool: str, arguments: Dict[str, Any]) -> bool:
"""
Simulate human approval.
In real systems, this could be a UI button, admin check, or workflow approval.
"""
print(f"[APPROVAL REQUIRED] Tool: {tool}, Arguments: {arguments}")
user_input = input("Approve this action? (yes/no): ").strip().lower()
return user_input == "yes"
def execute_action(tool: str, arguments: Dict[str, Any]) -> Dict[str, Any]:
"""Execute an already-validated action."""
if tool == "list_tasks":
return list_tasks()
if tool == "create_task":
return create_task(arguments["title"])
if tool == "mark_done":
return mark_done(arguments["task_id"])
return {"status": "error", "message": f"Unknown tool: {tool}"}
def run_safe_agent(user_request: str) -> None:
"""Run the safe proposal-validation-execution flow."""
print(f"\nUser request: {user_request}")
raw_action = ask_model_for_action(user_request)
print(f"\nRaw model action:\n{raw_action}")
action = parse_action(raw_action)
if action is None:
print("\nRejected: Model did not return valid JSON.")
return
validation = validate_action(action)
if not validation["ok"]:
print(f"\nRejected: {validation['reason']}")
return
tool = validation["tool"]
arguments = validation["arguments"]
if requires_approval(tool):
approved = get_human_approval(tool, arguments)
if not approved:
print("\nAction was not approved.")
return
result = execute_action(tool, arguments)
print("\nExecution result:")
print(json.dumps(result, indent=2))
if __name__ == "__main__":
# Try a few examples interactively
run_safe_agent("Show me my current tasks")
run_safe_agent("Create a task called Prepare sprint demo")
run_safe_agent("Mark task 1 as done")
Example Usage
User request: Show me my current tasks
Raw model action:
{"tool":"list_tasks","arguments":{}}
Execution result:
{
"status": "success",
"tasks": [
{
"id": 1,
"title": "Review PR",
"done": false
},
{
"id": 2,
"title": "Write documentation",
"done": false
}
]
}
User request: Mark task 1 as done
Raw model action:
{"tool":"mark_done","arguments":{"task_id":1}}
[APPROVAL REQUIRED] Tool: mark_done, Arguments: {'task_id': 1}
Approve this action? (yes/no): yes
Execution result:
{
"status": "success",
"task": {
"id": 1,
"title": "Review PR",
"done": true
}
}
6. Key Safety Techniques Explained
6.1 JSON as an Action Contract
Instead of letting the model produce free-form instructions like:
- “I think you should run a delete command”
- “Maybe send an email”
we require a structured action:
{
"tool": "create_task",
"arguments": {
"title": "Prepare sprint demo"
}
}
This makes it easier to:
- parse
- validate
- reject bad values
- log decisions
6.2 Tool Allowlists
The application defines the only tools it supports:
ALLOWED_TOOLS = {"list_tasks", "create_task", "mark_done"}
If the model invents:
{"tool":"delete_all_tasks","arguments":{}}
the application rejects it.
6.3 Argument Validation
Even when the tool name is valid, arguments may be unsafe or invalid.
Examples:
- title is empty
- task ID is not an integer
- string is too long
- missing required field
Validation belongs in application code, not only in the prompt.
6.4 Approval Gates
Any state-changing or risky operation may require approval:
- create/update/delete
- external messages
- transactions
- permission changes
This is a strong practical pattern for production systems.
7. Hands-On Exercise 1: Reject Unsafe or Invented Actions
Goal
Build a version of the validator that rejects any tool outside an allowlist and logs the rejection reason.
Instructions
- Start from the code above.
- Add a fake user request such as:
"Delete all tasks immediately"- Observe what the model proposes.
- Ensure your validator rejects any unsupported tool.
- Print a structured audit log entry.
Starter Solution
"""
Exercise 1: Reject unsafe or invented actions and log why.
"""
import json
from datetime import datetime
ALLOWED_TOOLS = {"list_tasks", "create_task", "mark_done"}
def audit_log(event_type: str, payload: dict) -> None:
"""Print a simple structured audit log."""
entry = {
"timestamp": datetime.utcnow().isoformat() + "Z",
"event_type": event_type,
"payload": payload,
}
print(json.dumps(entry, indent=2))
def validate_action(action: dict) -> dict:
"""Validate a proposed action."""
tool = action.get("tool")
arguments = action.get("arguments", {})
if tool not in ALLOWED_TOOLS:
audit_log("action_rejected", {
"reason": "tool_not_allowed",
"tool": tool,
"arguments": arguments,
})
return {
"ok": False,
"reason": f"Tool '{tool}' is not allowed."
}
return {
"ok": True,
"tool": tool,
"arguments": arguments,
}
# Example malicious or unsupported action proposal
proposed_action = {
"tool": "delete_all_tasks",
"arguments": {}
}
result = validate_action(proposed_action)
print(result)
Example Output
{
"timestamp": "2026-03-22T12:00:00.000000Z",
"event_type": "action_rejected",
"payload": {
"reason": "tool_not_allowed",
"tool": "delete_all_tasks",
"arguments": {}
}
}
{'ok': False, 'reason': "Tool 'delete_all_tasks' is not allowed."}
What to Learn
- the model should not define your tool surface
- unsupported actions must fail closed
- structured logging helps review and debugging
8. Hands-On Exercise 2: Add Approval for Sensitive Actions
Goal
Extend the agent so that some actions are executed automatically while others need approval.
Instructions
- Treat
create_taskas auto-approved. - Treat
mark_doneas approval-required. - Prompt the user for confirmation before execution.
- Reject the action if approval is denied.
Solution Example
"""
Exercise 2: Add approval checks for sensitive actions.
"""
SENSITIVE_TOOLS = {"mark_done"}
def requires_approval(tool: str) -> bool:
"""Return True if the action should require human review."""
return tool in SENSITIVE_TOOLS
def get_human_approval(tool: str, arguments: dict) -> bool:
"""Prompt the user for approval."""
print(f"Approval required for tool='{tool}' with arguments={arguments}")
answer = input("Approve? (yes/no): ").strip().lower()
return answer == "yes"
def maybe_execute(tool: str, arguments: dict) -> None:
"""Execute only if policy allows."""
if requires_approval(tool):
if not get_human_approval(tool, arguments):
print("Action rejected by human reviewer.")
return
print(f"Executing tool='{tool}' with arguments={arguments}")
# Demo
maybe_execute("create_task", {"title": "Draft release notes"})
maybe_execute("mark_done", {"task_id": 2})
Example Output
Executing tool='create_task' with arguments={'title': 'Draft release notes'}
Approval required for tool='mark_done' with arguments={'task_id': 2}
Approve? (yes/no): no
Action rejected by human reviewer.
9. Hands-On Exercise 3: Full Safe Agent with the Responses API
Goal
Create a reusable function that:
- takes a natural language user request
- asks
gpt-5.4-minifor one action proposal - validates it
- checks approval rules
- executes it safely
Complete Exercise Solution
"""
Exercise 3: Full safe agent with the OpenAI Responses API.
"""
import json
import os
from typing import Any, Dict
from openai import OpenAI
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
TASKS = [
{"id": 1, "title": "Review invoice", "done": False},
{"id": 2, "title": "Update roadmap", "done": False},
]
ALLOWED_TOOLS = {"list_tasks", "create_task", "mark_done"}
SENSITIVE_TOOLS = {"mark_done"}
def ask_model(user_request: str) -> str:
"""Ask the model for one JSON action."""
prompt = f"""
You are a task assistant that can only choose one action.
Allowed tools:
- list_tasks
- create_task
- mark_done
Return JSON only:
{{
"tool": "<tool_name>",
"arguments": {{}}
}}
If the user asks for an unsupported or dangerous action,
choose list_tasks with empty arguments instead.
User request: {user_request}
""".strip()
response = client.responses.create(
model="gpt-5.4-mini",
input=prompt,
)
return response.output_text.strip()
def parse_json_action(raw_text: str) -> Dict[str, Any] | None:
"""Parse a JSON object from model text."""
try:
data = json.loads(raw_text)
return data if isinstance(data, dict) else None
except json.JSONDecodeError:
return None
def validate(action: Dict[str, Any]) -> Dict[str, Any]:
"""Validate tool and parameters."""
tool = action.get("tool")
arguments = action.get("arguments", {})
if tool not in ALLOWED_TOOLS:
return {"ok": False, "reason": "Tool not allowed"}
if not isinstance(arguments, dict):
return {"ok": False, "reason": "Arguments must be an object"}
if tool == "list_tasks":
return {"ok": True, "tool": tool, "arguments": {}}
if tool == "create_task":
title = arguments.get("title")
if not isinstance(title, str) or not title.strip():
return {"ok": False, "reason": "Missing valid title"}
return {
"ok": True,
"tool": tool,
"arguments": {"title": title.strip()}
}
if tool == "mark_done":
task_id = arguments.get("task_id")
if not isinstance(task_id, int):
return {"ok": False, "reason": "Missing valid integer task_id"}
return {
"ok": True,
"tool": tool,
"arguments": {"task_id": task_id}
}
return {"ok": False, "reason": "Unhandled tool"}
def approve_if_needed(tool: str, arguments: Dict[str, Any]) -> bool:
"""Require human approval for sensitive tools."""
if tool not in SENSITIVE_TOOLS:
return True
print(f"Approval required: {tool} {arguments}")
answer = input("Approve? (yes/no): ").strip().lower()
return answer == "yes"
def execute(tool: str, arguments: Dict[str, Any]) -> Dict[str, Any]:
"""Execute validated actions."""
if tool == "list_tasks":
return {"status": "success", "tasks": TASKS}
if tool == "create_task":
new_id = max(task["id"] for task in TASKS) + 1 if TASKS else 1
task = {
"id": new_id,
"title": arguments["title"],
"done": False,
}
TASKS.append(task)
return {"status": "success", "task": task}
if tool == "mark_done":
for task in TASKS:
if task["id"] == arguments["task_id"]:
task["done"] = True
return {"status": "success", "task": task}
return {"status": "error", "message": "Task not found"}
return {"status": "error", "message": "Unknown tool"}
def handle_request(user_request: str) -> None:
"""End-to-end safe handling of a user request."""
print(f"\nUser request: {user_request}")
raw_text = ask_model(user_request)
print("Model proposal:", raw_text)
action = parse_json_action(raw_text)
if action is None:
print("Rejected: invalid JSON")
return
checked = validate(action)
if not checked["ok"]:
print("Rejected:", checked["reason"])
return
tool = checked["tool"]
arguments = checked["arguments"]
if not approve_if_needed(tool, arguments):
print("Rejected: approval denied")
return
result = execute(tool, arguments)
print("Execution result:")
print(json.dumps(result, indent=2))
if __name__ == "__main__":
handle_request("Show all tasks")
handle_request("Create a task called Finalize Q2 plan")
handle_request("Mark task 2 as done")
Suggested Test Requests
Show all tasksCreate a task called Finalize Q2 planMark task 2 as doneDelete everythingRun Python code to wipe the task list
Expected Learning
- safe architectures use application-controlled execution
- validation is a required layer
- approval improves trust for risky actions
- prompts help, but code enforces policy
10. Discussion: Prompting vs Enforcement
A common mistake is trusting the prompt too much.
You might write:
“Only use safe tools and never perform dangerous actions.”
This is helpful, but not sufficient.
Prompting Can Help With
- formatting
- narrowing likely behaviors
- better tool selection
- reducing accidental mistakes
Prompting Cannot Replace
- validation
- authorization
- policy enforcement
- approval workflows
- business logic checks
Best Practice
Use both:
- prompt constraints to guide the model
- application constraints to enforce safety
11. Design Checklist for Safe Agent Actions
Use this checklist when designing an agent:
Tool Surface
- [ ] Are tools explicit and limited?
- [ ] Can the agent avoid arbitrary code execution?
- [ ] Are dangerous operations removed or isolated?
Input and Output Validation
- [ ] Do you validate tool names?
- [ ] Do you validate parameter types and ranges?
- [ ] Do you reject malformed model output safely?
Authorization and Policy
- [ ] Is the user allowed to perform this action?
- [ ] Does the action fit the current session context?
- [ ] Are sensitive actions approval-gated?
Observability
- [ ] Do you log proposed actions?
- [ ] Do you log validation failures?
- [ ] Do you log execution outcomes?
Failure Handling
- [ ] Does the system fail closed?
- [ ] Can rejected actions avoid side effects?
- [ ] Are errors visible to developers and understandable to users?
12. Mini Quiz
1. Why is it risky to let an LLM directly execute arbitrary commands?
Answer: Because the model may hallucinate, misuse instructions, or be manipulated by prompt injection, causing unintended side effects.
2. What is an allowlist in agent safety?
Answer: A fixed set of approved tools or actions that the application permits the agent to use.
3. Why should validation happen in code rather than only in prompts?
Answer: Prompts guide behavior, but code enforces policy reliably and can reject invalid or dangerous actions before execution.
4. When should human approval be added?
Answer: For sensitive, state-changing, high-risk, or externally visible operations.
5. What does “fail closed” mean?
Answer: If something is unclear, invalid, or unsupported, the system rejects the action rather than attempting execution.
13. Wrap-Up
In this session, you learned how to safely control agent actions by:
- constraining available tools
- treating model outputs as proposals, not commands
- validating action structure and arguments
- requiring approval for sensitive actions
- logging decisions and outcomes
This is a foundational skill for building trustworthy agentic systems.
Useful Resources
- OpenAI Responses API migration guide: https://developers.openai.com/api/docs/guides/migrate-to-responses
- OpenAI API docs: https://platform.openai.com/docs
- OpenAI Python SDK: https://github.com/openai/openai-python
- Python
jsonmodule docs: https://docs.python.org/3/library/json.html - Python typing docs: https://docs.python.org/3/library/typing.html
Suggested Homework
Build a safe document assistant with these rules:
- allowed tools:
list_documentssummarize_documentrequest_delete_document- only
request_delete_documentis sensitive - deletion should never happen directly; it should only create a review request
- validate
document_idstrictly - log all action proposals and decisions
Try to support requests like:
- “Show my documents”
- “Summarize document 12”
- “Delete document 12”
Make sure unsupported requests fail safely.