Skip to content

Session 4: Controlling Agent Actions Safely

Synopsis

Introduces approval steps, permission scopes, execution limits, and auditability for tool-enabled systems. This session emphasizes that action-oriented systems require stronger safeguards than read-only assistants.

Session Content

Session 4: Controlling Agent Actions Safely

Session Overview

In this session, learners will explore how to safely control the actions of LLM-powered agents. The focus is on reducing unintended behavior, validating model outputs before execution, restricting tool access, and introducing human-in-the-loop approval patterns. By the end of the session, learners will be able to design safer agent workflows in Python using the OpenAI Responses API with gpt-5.4-mini.

Duration

~45 minutes

Learning Objectives

By the end of this session, learners should be able to:

  • Explain why agent safety matters in practical applications
  • Identify common failure modes in agent action execution
  • Validate and constrain model outputs before taking actions
  • Implement allowlists and guardrails for tool usage
  • Add approval checkpoints before sensitive operations
  • Build a small safe-action agent loop in Python

1. Why Safe Action Control Matters

Agents are powerful because they do more than generate text. They can:

  • call tools
  • write files
  • send emails
  • query databases
  • trigger workflows
  • modify system state

This introduces risk. A normal chatbot can say something incorrect. An agent can do something incorrect.

Common Risks

  • Hallucinated actions: the model invents a command or parameter
  • Unsafe tool usage: the model tries to call a tool it should not access
  • Prompt injection: external content tries to manipulate the agent
  • Over-broad execution: the model performs more actions than required
  • Sensitive operations without approval: deleting data, sending messages, making purchases
  • Poor parameter quality: malformed JSON, wrong IDs, invalid file paths

Core Safety Principle

A useful rule for agent systems:

The model may propose actions, but the application must decide whether to execute them.

This means:

  • the model is not the final authority
  • every action should be checked
  • tools should be tightly scoped
  • sensitive tasks should require explicit approval

2. Safety Patterns for Agentic Systems

2.1 Constrain the Action Space

Do not let the model execute arbitrary code or commands.

Prefer:

  • a small set of explicit tools
  • clear tool schemas
  • strict argument validation
  • business rules in application code

Avoid:

  • shell execution from raw model text
  • dynamic eval
  • unrestricted file access
  • database access without query checks

2.2 Validate Before Execute

Before executing a model-requested action, verify:

  • the tool name is allowed
  • required fields are present
  • field types are correct
  • parameter values are within acceptable ranges
  • action is appropriate for the current user/session
  • action is not sensitive without approval

2.3 Human-in-the-Loop for Sensitive Actions

Some actions should never happen automatically, such as:

  • deleting records
  • sending email to real users
  • issuing refunds
  • modifying permissions
  • writing to production systems

Pattern:

  1. model proposes action
  2. application validates it
  3. application asks human for approval
  4. only then execute

2.4 Separate Planning from Execution

A safer architecture often uses two stages:

  • Planning: the model suggests what should happen
  • Execution: your application checks and performs allowed steps

This reduces the chance of the model directly controlling side effects.

2.5 Log Every Action Attempt

Track:

  • user request
  • model response
  • requested tool/action
  • validation decision
  • execution outcome
  • approval status

This is essential for debugging and auditing.


3. Safe Agent Design in Python

We will simulate a small task-management agent with these allowed tools:

  • list_tasks
  • create_task
  • mark_done

We will intentionally not allow:

  • delete_all_tasks
  • arbitrary Python execution
  • filesystem writes
  • shell commands

We will implement:

  • a tool allowlist
  • parameter validation
  • approval checks for sensitive actions
  • a safe execution loop

4. Environment Setup

Install the OpenAI SDK:

pip install openai

Set your API key:

export OPENAI_API_KEY="your_api_key_here"

On Windows PowerShell:

setx OPENAI_API_KEY "your_api_key_here"

5. Core Example: A Safe Action Controller

This first example shows a complete safe-action pattern:

  • the model is asked to produce a structured tool request
  • the application parses and validates it
  • only approved, valid actions are executed

Example: Safe Agent for Task Operations

"""
Session 4 - Controlling Agent Actions Safely

This example demonstrates a safe agent loop:
1. Ask the model to propose an action in JSON format
2. Parse the JSON safely
3. Validate tool name and arguments
4. Require approval for sensitive operations
5. Execute only approved, allowed actions
"""

import json
import os
from typing import Any, Dict, Optional

from openai import OpenAI

client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

# -----------------------------------------------------------------------------
# In-memory task store for demonstration
# -----------------------------------------------------------------------------
TASKS = [
    {"id": 1, "title": "Review PR", "done": False},
    {"id": 2, "title": "Write documentation", "done": False},
]


# -----------------------------------------------------------------------------
# Allowed tools and safety policy
# -----------------------------------------------------------------------------
ALLOWED_TOOLS = {"list_tasks", "create_task", "mark_done"}

# Operations that should require explicit human approval
SENSITIVE_TOOLS = {"mark_done"}  # mark_done could be considered state-changing


def list_tasks() -> Dict[str, Any]:
    """Return all tasks."""
    return {"status": "success", "tasks": TASKS}


def create_task(title: str) -> Dict[str, Any]:
    """Create a new task after validation."""
    new_id = max(task["id"] for task in TASKS) + 1 if TASKS else 1
    task = {"id": new_id, "title": title, "done": False}
    TASKS.append(task)
    return {"status": "success", "task": task}


def mark_done(task_id: int) -> Dict[str, Any]:
    """Mark a task as completed."""
    for task in TASKS:
        if task["id"] == task_id:
            task["done"] = True
            return {"status": "success", "task": task}
    return {"status": "error", "message": f"Task {task_id} not found"}


def ask_model_for_action(user_request: str) -> str:
    """
    Ask the model to suggest one tool call as JSON only.

    We instruct the model to return a strict JSON object with:
    - tool: string
    - arguments: object
    """
    prompt = f"""
You are a task assistant.

A user will make a request. Choose exactly one tool from this set:
- list_tasks
- create_task
- mark_done

Return JSON only with this schema:
{{
  "tool": "<tool_name>",
  "arguments": {{}}
}}

Rules:
- Do not include markdown
- Do not include explanations
- Use only the allowed tool names
- If the request is unclear, choose list_tasks with empty arguments

User request: {user_request}
""".strip()

    response = client.responses.create(
        model="gpt-5.4-mini",
        input=prompt
    )

    return response.output_text.strip()


def parse_action(raw_text: str) -> Optional[Dict[str, Any]]:
    """Parse the model's JSON action safely."""
    try:
        action = json.loads(raw_text)
        if not isinstance(action, dict):
            return None
        return action
    except json.JSONDecodeError:
        return None


def validate_action(action: Dict[str, Any]) -> Dict[str, Any]:
    """
    Validate the proposed action against application safety rules.
    Returns a structured validation result.
    """
    tool = action.get("tool")
    arguments = action.get("arguments", {})

    if tool not in ALLOWED_TOOLS:
        return {
            "ok": False,
            "reason": f"Tool '{tool}' is not allowed."
        }

    if not isinstance(arguments, dict):
        return {
            "ok": False,
            "reason": "Arguments must be a JSON object."
        }

    if tool == "list_tasks":
        return {"ok": True, "tool": tool, "arguments": arguments}

    if tool == "create_task":
        title = arguments.get("title")
        if not isinstance(title, str) or not title.strip():
            return {
                "ok": False,
                "reason": "create_task requires a non-empty string 'title'."
            }
        if len(title) > 100:
            return {
                "ok": False,
                "reason": "Task title must be 100 characters or fewer."
            }
        return {"ok": True, "tool": tool, "arguments": {"title": title.strip()}}

    if tool == "mark_done":
        task_id = arguments.get("task_id")
        if not isinstance(task_id, int):
            return {
                "ok": False,
                "reason": "mark_done requires integer 'task_id'."
            }
        return {"ok": True, "tool": tool, "arguments": {"task_id": task_id}}

    return {"ok": False, "reason": "Unexpected validation path."}


def requires_approval(tool: str) -> bool:
    """Return True if this tool requires human approval."""
    return tool in SENSITIVE_TOOLS


def get_human_approval(tool: str, arguments: Dict[str, Any]) -> bool:
    """
    Simulate human approval.
    In real systems, this could be a UI button, admin check, or workflow approval.
    """
    print(f"[APPROVAL REQUIRED] Tool: {tool}, Arguments: {arguments}")
    user_input = input("Approve this action? (yes/no): ").strip().lower()
    return user_input == "yes"


def execute_action(tool: str, arguments: Dict[str, Any]) -> Dict[str, Any]:
    """Execute an already-validated action."""
    if tool == "list_tasks":
        return list_tasks()
    if tool == "create_task":
        return create_task(arguments["title"])
    if tool == "mark_done":
        return mark_done(arguments["task_id"])
    return {"status": "error", "message": f"Unknown tool: {tool}"}


def run_safe_agent(user_request: str) -> None:
    """Run the safe proposal-validation-execution flow."""
    print(f"\nUser request: {user_request}")

    raw_action = ask_model_for_action(user_request)
    print(f"\nRaw model action:\n{raw_action}")

    action = parse_action(raw_action)
    if action is None:
        print("\nRejected: Model did not return valid JSON.")
        return

    validation = validate_action(action)
    if not validation["ok"]:
        print(f"\nRejected: {validation['reason']}")
        return

    tool = validation["tool"]
    arguments = validation["arguments"]

    if requires_approval(tool):
        approved = get_human_approval(tool, arguments)
        if not approved:
            print("\nAction was not approved.")
            return

    result = execute_action(tool, arguments)
    print("\nExecution result:")
    print(json.dumps(result, indent=2))


if __name__ == "__main__":
    # Try a few examples interactively
    run_safe_agent("Show me my current tasks")
    run_safe_agent("Create a task called Prepare sprint demo")
    run_safe_agent("Mark task 1 as done")

Example Usage

User request: Show me my current tasks

Raw model action:
{"tool":"list_tasks","arguments":{}}

Execution result:
{
  "status": "success",
  "tasks": [
    {
      "id": 1,
      "title": "Review PR",
      "done": false
    },
    {
      "id": 2,
      "title": "Write documentation",
      "done": false
    }
  ]
}
User request: Mark task 1 as done

Raw model action:
{"tool":"mark_done","arguments":{"task_id":1}}

[APPROVAL REQUIRED] Tool: mark_done, Arguments: {'task_id': 1}
Approve this action? (yes/no): yes

Execution result:
{
  "status": "success",
  "task": {
    "id": 1,
    "title": "Review PR",
    "done": true
  }
}

6. Key Safety Techniques Explained

6.1 JSON as an Action Contract

Instead of letting the model produce free-form instructions like:

  • “I think you should run a delete command”
  • “Maybe send an email”

we require a structured action:

{
  "tool": "create_task",
  "arguments": {
    "title": "Prepare sprint demo"
  }
}

This makes it easier to:

  • parse
  • validate
  • reject bad values
  • log decisions

6.2 Tool Allowlists

The application defines the only tools it supports:

ALLOWED_TOOLS = {"list_tasks", "create_task", "mark_done"}

If the model invents:

{"tool":"delete_all_tasks","arguments":{}}

the application rejects it.

6.3 Argument Validation

Even when the tool name is valid, arguments may be unsafe or invalid.

Examples:

  • title is empty
  • task ID is not an integer
  • string is too long
  • missing required field

Validation belongs in application code, not only in the prompt.

6.4 Approval Gates

Any state-changing or risky operation may require approval:

  • create/update/delete
  • external messages
  • transactions
  • permission changes

This is a strong practical pattern for production systems.


7. Hands-On Exercise 1: Reject Unsafe or Invented Actions

Goal

Build a version of the validator that rejects any tool outside an allowlist and logs the rejection reason.

Instructions

  1. Start from the code above.
  2. Add a fake user request such as:
  3. "Delete all tasks immediately"
  4. Observe what the model proposes.
  5. Ensure your validator rejects any unsupported tool.
  6. Print a structured audit log entry.

Starter Solution

"""
Exercise 1: Reject unsafe or invented actions and log why.
"""

import json
from datetime import datetime


ALLOWED_TOOLS = {"list_tasks", "create_task", "mark_done"}


def audit_log(event_type: str, payload: dict) -> None:
    """Print a simple structured audit log."""
    entry = {
        "timestamp": datetime.utcnow().isoformat() + "Z",
        "event_type": event_type,
        "payload": payload,
    }
    print(json.dumps(entry, indent=2))


def validate_action(action: dict) -> dict:
    """Validate a proposed action."""
    tool = action.get("tool")
    arguments = action.get("arguments", {})

    if tool not in ALLOWED_TOOLS:
        audit_log("action_rejected", {
            "reason": "tool_not_allowed",
            "tool": tool,
            "arguments": arguments,
        })
        return {
            "ok": False,
            "reason": f"Tool '{tool}' is not allowed."
        }

    return {
        "ok": True,
        "tool": tool,
        "arguments": arguments,
    }


# Example malicious or unsupported action proposal
proposed_action = {
    "tool": "delete_all_tasks",
    "arguments": {}
}

result = validate_action(proposed_action)
print(result)

Example Output

{
  "timestamp": "2026-03-22T12:00:00.000000Z",
  "event_type": "action_rejected",
  "payload": {
    "reason": "tool_not_allowed",
    "tool": "delete_all_tasks",
    "arguments": {}
  }
}
{'ok': False, 'reason': "Tool 'delete_all_tasks' is not allowed."}

What to Learn

  • the model should not define your tool surface
  • unsupported actions must fail closed
  • structured logging helps review and debugging

8. Hands-On Exercise 2: Add Approval for Sensitive Actions

Goal

Extend the agent so that some actions are executed automatically while others need approval.

Instructions

  1. Treat create_task as auto-approved.
  2. Treat mark_done as approval-required.
  3. Prompt the user for confirmation before execution.
  4. Reject the action if approval is denied.

Solution Example

"""
Exercise 2: Add approval checks for sensitive actions.
"""

SENSITIVE_TOOLS = {"mark_done"}


def requires_approval(tool: str) -> bool:
    """Return True if the action should require human review."""
    return tool in SENSITIVE_TOOLS


def get_human_approval(tool: str, arguments: dict) -> bool:
    """Prompt the user for approval."""
    print(f"Approval required for tool='{tool}' with arguments={arguments}")
    answer = input("Approve? (yes/no): ").strip().lower()
    return answer == "yes"


def maybe_execute(tool: str, arguments: dict) -> None:
    """Execute only if policy allows."""
    if requires_approval(tool):
        if not get_human_approval(tool, arguments):
            print("Action rejected by human reviewer.")
            return

    print(f"Executing tool='{tool}' with arguments={arguments}")


# Demo
maybe_execute("create_task", {"title": "Draft release notes"})
maybe_execute("mark_done", {"task_id": 2})

Example Output

Executing tool='create_task' with arguments={'title': 'Draft release notes'}
Approval required for tool='mark_done' with arguments={'task_id': 2}
Approve? (yes/no): no
Action rejected by human reviewer.

9. Hands-On Exercise 3: Full Safe Agent with the Responses API

Goal

Create a reusable function that:

  • takes a natural language user request
  • asks gpt-5.4-mini for one action proposal
  • validates it
  • checks approval rules
  • executes it safely

Complete Exercise Solution

"""
Exercise 3: Full safe agent with the OpenAI Responses API.
"""

import json
import os
from typing import Any, Dict

from openai import OpenAI

client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

TASKS = [
    {"id": 1, "title": "Review invoice", "done": False},
    {"id": 2, "title": "Update roadmap", "done": False},
]

ALLOWED_TOOLS = {"list_tasks", "create_task", "mark_done"}
SENSITIVE_TOOLS = {"mark_done"}


def ask_model(user_request: str) -> str:
    """Ask the model for one JSON action."""
    prompt = f"""
You are a task assistant that can only choose one action.

Allowed tools:
- list_tasks
- create_task
- mark_done

Return JSON only:
{{
  "tool": "<tool_name>",
  "arguments": {{}}
}}

If the user asks for an unsupported or dangerous action,
choose list_tasks with empty arguments instead.

User request: {user_request}
""".strip()

    response = client.responses.create(
        model="gpt-5.4-mini",
        input=prompt,
    )
    return response.output_text.strip()


def parse_json_action(raw_text: str) -> Dict[str, Any] | None:
    """Parse a JSON object from model text."""
    try:
        data = json.loads(raw_text)
        return data if isinstance(data, dict) else None
    except json.JSONDecodeError:
        return None


def validate(action: Dict[str, Any]) -> Dict[str, Any]:
    """Validate tool and parameters."""
    tool = action.get("tool")
    arguments = action.get("arguments", {})

    if tool not in ALLOWED_TOOLS:
        return {"ok": False, "reason": "Tool not allowed"}

    if not isinstance(arguments, dict):
        return {"ok": False, "reason": "Arguments must be an object"}

    if tool == "list_tasks":
        return {"ok": True, "tool": tool, "arguments": {}}

    if tool == "create_task":
        title = arguments.get("title")
        if not isinstance(title, str) or not title.strip():
            return {"ok": False, "reason": "Missing valid title"}
        return {
            "ok": True,
            "tool": tool,
            "arguments": {"title": title.strip()}
        }

    if tool == "mark_done":
        task_id = arguments.get("task_id")
        if not isinstance(task_id, int):
            return {"ok": False, "reason": "Missing valid integer task_id"}
        return {
            "ok": True,
            "tool": tool,
            "arguments": {"task_id": task_id}
        }

    return {"ok": False, "reason": "Unhandled tool"}


def approve_if_needed(tool: str, arguments: Dict[str, Any]) -> bool:
    """Require human approval for sensitive tools."""
    if tool not in SENSITIVE_TOOLS:
        return True

    print(f"Approval required: {tool} {arguments}")
    answer = input("Approve? (yes/no): ").strip().lower()
    return answer == "yes"


def execute(tool: str, arguments: Dict[str, Any]) -> Dict[str, Any]:
    """Execute validated actions."""
    if tool == "list_tasks":
        return {"status": "success", "tasks": TASKS}

    if tool == "create_task":
        new_id = max(task["id"] for task in TASKS) + 1 if TASKS else 1
        task = {
            "id": new_id,
            "title": arguments["title"],
            "done": False,
        }
        TASKS.append(task)
        return {"status": "success", "task": task}

    if tool == "mark_done":
        for task in TASKS:
            if task["id"] == arguments["task_id"]:
                task["done"] = True
                return {"status": "success", "task": task}
        return {"status": "error", "message": "Task not found"}

    return {"status": "error", "message": "Unknown tool"}


def handle_request(user_request: str) -> None:
    """End-to-end safe handling of a user request."""
    print(f"\nUser request: {user_request}")

    raw_text = ask_model(user_request)
    print("Model proposal:", raw_text)

    action = parse_json_action(raw_text)
    if action is None:
        print("Rejected: invalid JSON")
        return

    checked = validate(action)
    if not checked["ok"]:
        print("Rejected:", checked["reason"])
        return

    tool = checked["tool"]
    arguments = checked["arguments"]

    if not approve_if_needed(tool, arguments):
        print("Rejected: approval denied")
        return

    result = execute(tool, arguments)
    print("Execution result:")
    print(json.dumps(result, indent=2))


if __name__ == "__main__":
    handle_request("Show all tasks")
    handle_request("Create a task called Finalize Q2 plan")
    handle_request("Mark task 2 as done")

Suggested Test Requests

  • Show all tasks
  • Create a task called Finalize Q2 plan
  • Mark task 2 as done
  • Delete everything
  • Run Python code to wipe the task list

Expected Learning

  • safe architectures use application-controlled execution
  • validation is a required layer
  • approval improves trust for risky actions
  • prompts help, but code enforces policy

10. Discussion: Prompting vs Enforcement

A common mistake is trusting the prompt too much.

You might write:

“Only use safe tools and never perform dangerous actions.”

This is helpful, but not sufficient.

Prompting Can Help With

  • formatting
  • narrowing likely behaviors
  • better tool selection
  • reducing accidental mistakes

Prompting Cannot Replace

  • validation
  • authorization
  • policy enforcement
  • approval workflows
  • business logic checks

Best Practice

Use both:

  • prompt constraints to guide the model
  • application constraints to enforce safety

11. Design Checklist for Safe Agent Actions

Use this checklist when designing an agent:

Tool Surface

  • [ ] Are tools explicit and limited?
  • [ ] Can the agent avoid arbitrary code execution?
  • [ ] Are dangerous operations removed or isolated?

Input and Output Validation

  • [ ] Do you validate tool names?
  • [ ] Do you validate parameter types and ranges?
  • [ ] Do you reject malformed model output safely?

Authorization and Policy

  • [ ] Is the user allowed to perform this action?
  • [ ] Does the action fit the current session context?
  • [ ] Are sensitive actions approval-gated?

Observability

  • [ ] Do you log proposed actions?
  • [ ] Do you log validation failures?
  • [ ] Do you log execution outcomes?

Failure Handling

  • [ ] Does the system fail closed?
  • [ ] Can rejected actions avoid side effects?
  • [ ] Are errors visible to developers and understandable to users?

12. Mini Quiz

1. Why is it risky to let an LLM directly execute arbitrary commands?

Answer: Because the model may hallucinate, misuse instructions, or be manipulated by prompt injection, causing unintended side effects.

2. What is an allowlist in agent safety?

Answer: A fixed set of approved tools or actions that the application permits the agent to use.

3. Why should validation happen in code rather than only in prompts?

Answer: Prompts guide behavior, but code enforces policy reliably and can reject invalid or dangerous actions before execution.

4. When should human approval be added?

Answer: For sensitive, state-changing, high-risk, or externally visible operations.

5. What does “fail closed” mean?

Answer: If something is unclear, invalid, or unsupported, the system rejects the action rather than attempting execution.


13. Wrap-Up

In this session, you learned how to safely control agent actions by:

  • constraining available tools
  • treating model outputs as proposals, not commands
  • validating action structure and arguments
  • requiring approval for sensitive actions
  • logging decisions and outcomes

This is a foundational skill for building trustworthy agentic systems.


Useful Resources

  • OpenAI Responses API migration guide: https://developers.openai.com/api/docs/guides/migrate-to-responses
  • OpenAI API docs: https://platform.openai.com/docs
  • OpenAI Python SDK: https://github.com/openai/openai-python
  • Python json module docs: https://docs.python.org/3/library/json.html
  • Python typing docs: https://docs.python.org/3/library/typing.html

Suggested Homework

Build a safe document assistant with these rules:

  • allowed tools:
  • list_documents
  • summarize_document
  • request_delete_document
  • only request_delete_document is sensitive
  • deletion should never happen directly; it should only create a review request
  • validate document_id strictly
  • log all action proposals and decisions

Try to support requests like:

  • “Show my documents”
  • “Summarize document 12”
  • “Delete document 12”

Make sure unsupported requests fail safely.


Back to Chapter | Back to Master Plan | Previous Session