Session 3: Governance, Policy, and Human Oversight

Synopsis

Introduces governance frameworks, review processes, escalation paths, and human approval mechanisms for high-impact use cases. Learners study how organizations keep agentic systems aligned with legal and ethical obligations.

Session Content

Session 3: Governance, Policy, and Human Oversight

Session Overview

Duration: ~45 minutes
Audience: Python developers with basic programming knowledge, learning GenAI and agentic development
Focus: How to build AI systems that are compliant, reviewable, and safe through governance rules, policy checks, and human-in-the-loop escalation

Learning Objectives

By the end of this session, learners will be able to:

Explain why governance is essential in GenAI and agentic systems
Distinguish between policy, guardrails, and human oversight
Identify common governance risks such as unsafe output, unauthorized actions, and poor traceability
Implement a simple policy enforcement layer in Python
Use the OpenAI Responses API with gpt-5.4-mini to classify requests and route risky cases for human review
Build a lightweight human-in-the-loop approval flow for sensitive actions

1. Why Governance Matters in Agentic Systems

Modern GenAI systems do more than generate text. They can:

summarize documents
draft emails
retrieve internal knowledge
call tools
take actions on behalf of users
chain multiple steps automatically

As systems become more agentic, the risk profile increases.

Common Risks

Unsafe content generation
harmful instructions
privacy violations
discriminatory output
Unauthorized actions
sending messages without approval
modifying records
triggering financial or operational actions
Policy violations
sharing confidential data
acting outside business rules
ignoring approval workflows
Lack of accountability
no audit trail
unclear decision path
no record of human review

Governance Goals

A well-governed AI system should be:

Policy-aware — it knows what is allowed, restricted, or prohibited
Traceable — decisions and actions can be logged and reviewed
Reviewable — risky tasks can be escalated to a human
Controlled — sensitive actions require explicit approval
Testable — governance behavior can be validated

2. Core Concepts: Policy, Guardrails, and Human Oversight

Policy

A policy is a rule or set of rules about what the system may or may not do.

Examples:

Never reveal API keys or secrets
Do not provide legal or medical advice as final guidance
Escalate requests involving customer financial changes
Require approval before sending outbound messages to external users

Policies may come from:

company rules
legal requirements
compliance teams
product design decisions
safety best practices

Guardrails

Guardrails are the technical mechanisms used to enforce policy.

Examples:

input filtering
output validation
action allowlists
tool restrictions
risk classification
confidence thresholds
escalation routing

Human Oversight

Human oversight means a person can:

review sensitive outputs
approve or reject actions
handle ambiguous requests
investigate policy flags
override automation when justified

Human oversight is especially important for:

external communications
high-impact decisions
financial actions
customer-sensitive operations
requests involving personal data
unclear or conflicting policy cases

3. A Practical Governance Pattern

A simple governance architecture for agentic systems:

Receive user request
Classify risk
Check policies
Decide route
allow
modify
block
escalate to human
Log the decision
If approved, perform action

Example Routing Outcomes

Risk Level	Action
Low	Allow automatically
Medium	Allow with restrictions or warnings
High	Escalate to human review
Prohibited	Block

Example Sensitive Actions

sending an email
deleting data
issuing a refund
updating billing information
contacting an external customer
generating regulated advice

4. Designing Governance Rules for an Agent

For a Python-based agent, governance rules often cover three areas:

A. Content Rules

What the model is allowed to generate.

Examples:

no harmful instructions
no sensitive internal data disclosure
no fabricated compliance statements

B. Action Rules

What tools or actions the agent is allowed to use.

Examples:

draft email allowed
send email requires approval
database delete not allowed
customer refund over $100 requires human approval

C. Process Rules

How the system must behave before acting.

Examples:

log all decisions
require user confirmation for high-impact actions
store reason for escalation
provide an audit-friendly decision record

5. Theory Example: Governance Policy Table

Below is an example policy table for a support automation assistant.

Scenario	Policy
General FAQ answer	Allow
Draft reply to customer	Allow
Send reply to customer	Human approval required
Change billing info	Human approval required
Refund over threshold	Human approval required
Reveal internal credentials	Block
Legal advice	Escalate
Medical advice	Escalate
Request involving personal secrets	Block

This kind of table is a strong starting point because it makes governance explicit and testable.

6. Hands-On Exercise 1: Build a Policy Classifier with the Responses API

Goal

Create a Python program that uses gpt-5.4-mini to classify incoming requests into one of four governance outcomes:

allow
review
block
escalate

What You Will Learn

how to call the OpenAI Responses API from Python
how to make the model produce structured governance decisions
how to implement a simple policy-checking layer

Setup

Install the OpenAI SDK:

pip install openai

Set your API key:

export OPENAI_API_KEY="your_api_key_here"

Code

"""
exercise_1_policy_classifier.py

A simple governance classifier that uses OpenAI's Responses API
to classify user requests according to a small policy.

Model used: gpt-5.4-mini
"""

from openai import OpenAI
import json

# Create a reusable API client.
client = OpenAI()

# A compact governance policy that we want the model to follow.
POLICY_TEXT = """
You are a governance classifier for an internal AI assistant.

Classify the user's request into exactly one of these labels:
- allow: safe, low-risk request that can proceed automatically
- review: allowed only with human approval before any external or sensitive action
- escalate: ambiguous or high-risk domain request requiring expert or human handling
- block: prohibited request that must not be fulfilled

Policy rules:
1. Block requests for secrets, credentials, API keys, tokens, or confidential internal data.
2. Review any request that sends external communications or changes billing/refunds.
3. Escalate legal, medical, or ambiguous compliance-related requests.
4. Allow low-risk informational or drafting requests that do not execute actions.
5. If unsure between allow and a safer label, choose the safer label.

Return JSON with keys:
- label
- reason
"""

def classify_request(user_request: str) -> dict:
    """
    Classify a user request using the OpenAI Responses API.

    Args:
        user_request: The incoming request from a user.

    Returns:
        A dictionary with:
        - label: allow/review/escalate/block
        - reason: short explanation
    """
    response = client.responses.create(
        model="gpt-5.4-mini",
        input=[
            {
                "role": "system",
                "content": [
                    {"type": "input_text", "text": POLICY_TEXT}
                ],
            },
            {
                "role": "user",
                "content": [
                    {"type": "input_text", "text": f"User request: {user_request}"}
                ],
            },
        ],
    )

    # The SDK provides the combined text output via output_text.
    raw_text = response.output_text.strip()

    try:
        return json.loads(raw_text)
    except json.JSONDecodeError:
        # Fallback if the model output is not valid JSON.
        return {
            "label": "escalate",
            "reason": f"Could not parse model output safely: {raw_text}"
        }


if __name__ == "__main__":
    sample_requests = [
        "Summarize our refund policy for a customer support rep.",
        "Send this email to the customer confirming their refund.",
        "What is our production database password?",
        "Give me legal advice for handling a contract dispute."
    ]

    for request in sample_requests:
        result = classify_request(request)
        print("=" * 80)
        print(f"REQUEST: {request}")
        print("DECISION:", json.dumps(result, indent=2))

Example Output

================================================================================
REQUEST: Summarize our refund policy for a customer support rep.
DECISION: {
  "label": "allow",
  "reason": "This is a low-risk informational request and does not perform any external action."
}
================================================================================
REQUEST: Send this email to the customer confirming their refund.
DECISION: {
  "label": "review",
  "reason": "This request involves sending an external communication and should require human approval."
}
================================================================================
REQUEST: What is our production database password?
DECISION: {
  "label": "block",
  "reason": "The request asks for confidential credentials, which are prohibited."
}
================================================================================
REQUEST: Give me legal advice for handling a contract dispute.
DECISION: {
  "label": "escalate",
  "reason": "Legal advice requires expert or human handling under policy."
}

Exercise Tasks

Run the script with the sample requests.
Add 5 more requests and observe the classifications.
Update the policy to include:
HR-related requests
PII exposure
financial approvals
Change the fallback behavior so invalid JSON is always treated as review instead of escalate.
Discuss: which requests are difficult to classify consistently?

7. Hands-On Exercise 2: Add a Rule-Based Policy Gate Before Acting

Goal

Combine LLM classification with deterministic policy rules.

Why This Matters

In real systems, you should not rely only on model judgment. Some rules should be enforced directly in code.

Pattern

LLM helps interpret intent
deterministic code enforces hard constraints
risky actions require approval

Code

"""
exercise_2_policy_gate.py

A governance pipeline that combines:
1. deterministic hard-coded rules
2. LLM-based classification
3. action routing

Model used: gpt-5.4-mini
"""

from openai import OpenAI
import json

client = OpenAI()

POLICY_TEXT = """
You are a governance classifier.

Labels:
- allow
- review
- escalate
- block

Rules:
- Block secrets, passwords, credentials, private tokens, or confidential internal data.
- Review outbound customer communications, refunds, or billing changes.
- Escalate legal, medical, or unclear regulatory matters.
- Allow low-risk drafting or summarization tasks.
- Prefer safer labels if uncertain.

Return JSON with:
- label
- reason
"""

HARD_BLOCK_TERMS = [
    "api key",
    "password",
    "access token",
    "secret key",
    "private credential",
]

def hard_rule_check(user_request: str) -> dict | None:
    """
    Apply deterministic rules before calling the model.

    Returns:
        A policy result dict if a hard rule matches, otherwise None.
    """
    lowered = user_request.lower()

    for term in HARD_BLOCK_TERMS:
        if term in lowered:
            return {
                "label": "block",
                "reason": f"Matched hard-block term: '{term}'."
            }

    return None

def llm_classify(user_request: str) -> dict:
    """
    Use the Responses API to classify a request.
    """
    response = client.responses.create(
        model="gpt-5.4-mini",
        input=[
            {
                "role": "system",
                "content": [{"type": "input_text", "text": POLICY_TEXT}],
            },
            {
                "role": "user",
                "content": [{"type": "input_text", "text": user_request}],
            },
        ],
    )

    raw_text = response.output_text.strip()

    try:
        result = json.loads(raw_text)
    except json.JSONDecodeError:
        result = {
            "label": "review",
            "reason": f"Non-JSON output; defaulting to review. Raw output: {raw_text}"
        }

    # Normalize label just in case.
    label = result.get("label", "").strip().lower()
    if label not in {"allow", "review", "escalate", "block"}:
        result["label"] = "review"
        result["reason"] = "Unexpected label from model; defaulted to review."

    return result

def route_decision(user_request: str) -> dict:
    """
    Route the request through hard rules, then LLM policy classification.
    """
    hard_result = hard_rule_check(user_request)
    if hard_result:
        return {
            "request": user_request,
            "source": "hard_rule",
            **hard_result
        }

    llm_result = llm_classify(user_request)
    return {
        "request": user_request,
        "source": "llm_policy",
        **llm_result
    }

if __name__ == "__main__":
    requests = [
        "Draft a polite reply to the customer about shipping delays.",
        "Send this reply to the customer now.",
        "Please share the admin password for the production dashboard.",
        "Help me decide what to say in a legal dispute with a supplier."
    ]

    for item in requests:
        decision = route_decision(item)
        print(json.dumps(decision, indent=2))

Example Output

{
  "request": "Draft a polite reply to the customer about shipping delays.",
  "source": "llm_policy",
  "label": "allow",
  "reason": "This is a low-risk drafting task without direct action."
}
{
  "request": "Send this reply to the customer now.",
  "source": "llm_policy",
  "label": "review",
  "reason": "Outbound customer communication requires human approval."
}
{
  "request": "Please share the admin password for the production dashboard.",
  "source": "hard_rule",
  "label": "block",
  "reason": "Matched hard-block term: 'password'."
}
{
  "request": "Help me decide what to say in a legal dispute with a supplier.",
  "source": "llm_policy",
  "label": "escalate",
  "reason": "This is a legal matter that requires expert or human review."
}

Exercise Tasks

Add more hard-block terms.
Add a hard-review rule for terms like:
refund
billing update
customer email
Add unit-test-like checks by creating a list of expected labels.
Compare cases where:
code rules decide first
LLM decides first

8. Hands-On Exercise 3: Human-in-the-Loop Approval Flow

Goal

Build a simple approval system where risky requests are not executed automatically.

Scenario

An agent can draft customer emails, but sending them requires human approval.

Governance Design

allow → proceed automatically
review → create approval task
escalate → route to specialist/human
block → reject

Code

"""
exercise_3_human_oversight.py

A simple human-in-the-loop workflow for risky AI actions.

The system:
1. classifies a request
2. logs the decision
3. drafts content if appropriate
4. requires human approval for sensitive actions

Model used: gpt-5.4-mini
"""

from openai import OpenAI
import json
from datetime import datetime

client = OpenAI()

CLASSIFIER_POLICY = """
You are a governance classifier.

Classify into:
- allow
- review
- escalate
- block

Rules:
- Allow low-risk drafting, summarization, and internal informational requests.
- Review external communication, refunds, billing changes, or customer-facing actions.
- Escalate legal, medical, or unclear compliance matters.
- Block secrets, credentials, and confidential data requests.

Return JSON with:
- label
- reason
"""

def classify_request(user_request: str) -> dict:
    """
    Classify a request with the Responses API.
    """
    response = client.responses.create(
        model="gpt-5.4-mini",
        input=[
            {
                "role": "system",
                "content": [{"type": "input_text", "text": CLASSIFIER_POLICY}],
            },
            {
                "role": "user",
                "content": [{"type": "input_text", "text": user_request}],
            },
        ],
    )

    try:
        return json.loads(response.output_text)
    except json.JSONDecodeError:
        return {
            "label": "review",
            "reason": "Parsing failed; defaulting to human review."
        }

def draft_customer_email(topic: str) -> str:
    """
    Ask the model to draft a professional customer email.
    """
    prompt = f"""
Draft a short, professional customer support email about this topic:
{topic}

Do not claim actions were completed unless explicitly stated.
Keep the tone clear and polite.
"""

    response = client.responses.create(
        model="gpt-5.4-mini",
        input=prompt,
    )
    return response.output_text.strip()

def create_audit_log(entry: dict) -> None:
    """
    Print an audit log entry.
    In a real system, write this to a database or log sink.
    """
    print("\n[AUDIT LOG]")
    print(json.dumps(entry, indent=2))

def create_review_task(user_request: str, draft: str, reason: str) -> dict:
    """
    Build a review task for a human approver.
    """
    return {
        "task_type": "human_approval_required",
        "created_at": datetime.utcnow().isoformat() + "Z",
        "request": user_request,
        "draft": draft,
        "reason": reason,
        "status": "pending"
    }

def process_request(user_request: str) -> dict:
    """
    Process a request through governance controls.
    """
    decision = classify_request(user_request)
    label = decision["label"]
    reason = decision["reason"]

    audit_entry = {
        "timestamp": datetime.utcnow().isoformat() + "Z",
        "request": user_request,
        "decision": label,
        "reason": reason,
    }

    if label == "block":
        audit_entry["outcome"] = "rejected"
        create_audit_log(audit_entry)
        return {
            "status": "rejected",
            "message": "Request blocked by policy.",
            "reason": reason,
        }

    if label == "escalate":
        audit_entry["outcome"] = "escalated_to_specialist"
        create_audit_log(audit_entry)
        return {
            "status": "escalated",
            "message": "Request requires specialist or human handling.",
            "reason": reason,
        }

    if label == "review":
        draft = draft_customer_email(user_request)
        review_task = create_review_task(user_request, draft, reason)
        audit_entry["outcome"] = "queued_for_human_review"
        create_audit_log(audit_entry)
        return {
            "status": "pending_review",
            "review_task": review_task
        }

    # label == "allow"
    draft = draft_customer_email(user_request)
    audit_entry["outcome"] = "completed_automatically"
    create_audit_log(audit_entry)
    return {
        "status": "completed",
        "draft": draft
    }

if __name__ == "__main__":
    requests = [
        "Draft an email to a customer explaining their order is delayed.",
        "Send an apology email to the customer confirming their refund has been issued.",
        "What is the finance team's internal admin password?",
        "Write legal advice for responding to a supplier contract issue."
    ]

    for req in requests:
        print("\n" + "=" * 80)
        print(f"REQUEST: {req}")
        result = process_request(req)
        print("[RESULT]")
        print(json.dumps(result, indent=2))

Example Output

================================================================================
REQUEST: Draft an email to a customer explaining their order is delayed.

[AUDIT LOG]
{
  "timestamp": "2026-03-22T12:00:00Z",
  "request": "Draft an email to a customer explaining their order is delayed.",
  "decision": "allow",
  "reason": "This is a low-risk drafting request.",
  "outcome": "completed_automatically"
}
[RESULT]
{
  "status": "completed",
  "draft": "Subject: Update on Your Order\n\nHello,\n\nWe wanted to let you know that your order is taking longer than expected to arrive. We apologize for the delay and appreciate your patience.\n\nBest regards,\nCustomer Support"
}
================================================================================
REQUEST: Send an apology email to the customer confirming their refund has been issued.

[AUDIT LOG]
{
  "timestamp": "2026-03-22T12:00:10Z",
  "request": "Send an apology email to the customer confirming their refund has been issued.",
  "decision": "review",
  "reason": "This involves external customer communication and refund-related messaging.",
  "outcome": "queued_for_human_review"
}
[RESULT]
{
  "status": "pending_review",
  "review_task": {
    "task_type": "human_approval_required",
    "created_at": "2026-03-22T12:00:10Z",
    "request": "Send an apology email to the customer confirming their refund has been issued.",
    "draft": "Subject: Refund Confirmation\n\nHello,\n\nWe are sorry for the inconvenience. Your refund has been issued. Please let us know if you have any additional questions.\n\nBest regards,\nCustomer Support",
    "reason": "This involves external customer communication and refund-related messaging.",
    "status": "pending"
  }
}

Exercise Tasks

Add a console prompt for human approval:
approve
reject
If approved, simulate sending the email.
Store review tasks in a list or JSON file.
Add reviewer_name and review_timestamp.
Extend the workflow so escalate routes to a different queue than review.

9. Discussion: What Should Always Require Human Review?

Use this section as a group reflection or personal design checklist.

Strong Candidates for Mandatory Review

financial transactions
legal guidance
medical guidance
data deletion
external communications
personnel actions
changes to customer accounts
anything involving identity verification
disclosure of sensitive internal information
actions with irreversible impact

Questions to Ask

What is the harm if this is wrong?
Is the action reversible?
Does it affect an external user?
Does it involve regulated content?
Should there be an audit trail?
Would you be comfortable explaining the decision to an auditor?

10. Best Practices for Governance in GenAI Applications

1. Separate Policy from Prompting

Do not bury all governance logic inside one prompt. Keep policy visible in code or configuration.

2. Use Defense in Depth

Combine:

prompts
deterministic code rules
approval workflows
logs
monitoring

3. Default to Safer Outcomes

If parsing fails, confidence is low, or the request is ambiguous:

review
escalate
block

Avoid silently proceeding.

4. Log Important Decisions

Capture:

request
classification
reason
action taken
timestamp
reviewer identity when applicable

5. Restrict Actions, Not Just Text

An agent that can act needs strong action controls.

Examples:

allow draft, restrict send
allow read, restrict delete
allow suggestion, restrict execution

6. Test Policy Behavior

Build test cases for:

allowed requests
blocked requests
review-required requests
borderline ambiguous requests

7. Keep a Human Escape Hatch

Humans should be able to:

intervene
approve
reject
correct
disable unsafe automation

11. Mini Challenge

Challenge Prompt

Design a governance workflow for an internal HR assistant that can:

answer policy questions
draft employee emails
summarize handbook documents
update employee records

Your Task

Create a simple table with:

request type
risk level
policy outcome
whether human approval is required

12. Recap

In this session, you learned that governance in AI systems is not optional, especially for agentic workflows.

You covered:

why governance matters
the difference between policy, guardrails, and human oversight
how to classify requests using gpt-5.4-mini
how to add deterministic policy gates
how to create a basic human-in-the-loop review flow
how to log and route sensitive actions safely

The key idea is simple:

Not every AI-generated result should become an action automatically.
Good systems decide when to allow, review, escalate, or block.

Useful Resources

OpenAI Responses API migration guide: https://developers.openai.com/api/docs/guides/migrate-to-responses
OpenAI API docs: https://developers.openai.com/api/docs/
OpenAI Python SDK: https://github.com/openai/openai-python
OpenAI developer resources: https://developers.openai.com/resources/
OWASP LLM Top 10: https://owasp.org/www-project-top-10-for-large-language-model-applications/
NIST AI Risk Management Framework: https://www.nist.gov/itl/ai-risk-management-framework

Suggested Homework

Extend Exercise 3 into a reusable governance module.
Add a JSON-based policy file instead of hard-coded policy text.
Write 10 test cases covering:
allow
review
escalate
block
Add a simulated reviewer dashboard in the terminal.
Reflect on one real-world workflow in your company that should never be fully automated without human approval.

Back to Chapter | Back to Master Plan | Previous Session | Next Session

Session 3: Governance, Policy, and Human Oversight

Synopsis

Session Content

Session 3: Governance, Policy, and Human Oversight

Session Overview

Learning Objectives

1. Why Governance Matters in Agentic Systems

Common Risks

Governance Goals

2. Core Concepts: Policy, Guardrails, and Human Oversight

Policy

Guardrails

Human Oversight

3. A Practical Governance Pattern

Example Routing Outcomes

Example Sensitive Actions

4. Designing Governance Rules for an Agent

A. Content Rules

B. Action Rules

C. Process Rules

5. Theory Example: Governance Policy Table

6. Hands-On Exercise 1: Build a Policy Classifier with the Responses API

Goal

What You Will Learn

Setup

Code

Example Output

Exercise Tasks

7. Hands-On Exercise 2: Add a Rule-Based Policy Gate Before Acting

Goal

Why This Matters

Pattern

Code

Example Output

Exercise Tasks

8. Hands-On Exercise 3: Human-in-the-Loop Approval Flow

Goal

Scenario

Governance Design

Code

Example Output

Exercise Tasks

9. Discussion: What Should Always Require Human Review?

Strong Candidates for Mandatory Review

Questions to Ask

10. Best Practices for Governance in GenAI Applications

1. Separate Policy from Prompting

2. Use Defense in Depth

3. Default to Safer Outcomes

4. Log Important Decisions

5. Restrict Actions, Not Just Text

6. Test Policy Behavior

7. Keep a Human Escape Hatch

11. Mini Challenge

Challenge Prompt

Your Task

Suggested Questions

12. Recap

Useful Resources

Suggested Homework