Skip to content

Session 3: Governance, Policy, and Human Oversight

Synopsis

Introduces governance frameworks, review processes, escalation paths, and human approval mechanisms for high-impact use cases. Learners study how organizations keep agentic systems aligned with legal and ethical obligations.

Session Content

Session 3: Governance, Policy, and Human Oversight

Session Overview

Duration: ~45 minutes
Audience: Python developers with basic programming knowledge, learning GenAI and agentic development
Focus: How to build AI systems that are compliant, reviewable, and safe through governance rules, policy checks, and human-in-the-loop escalation

Learning Objectives

By the end of this session, learners will be able to:

  • Explain why governance is essential in GenAI and agentic systems
  • Distinguish between policy, guardrails, and human oversight
  • Identify common governance risks such as unsafe output, unauthorized actions, and poor traceability
  • Implement a simple policy enforcement layer in Python
  • Use the OpenAI Responses API with gpt-5.4-mini to classify requests and route risky cases for human review
  • Build a lightweight human-in-the-loop approval flow for sensitive actions

1. Why Governance Matters in Agentic Systems

Modern GenAI systems do more than generate text. They can:

  • summarize documents
  • draft emails
  • retrieve internal knowledge
  • call tools
  • take actions on behalf of users
  • chain multiple steps automatically

As systems become more agentic, the risk profile increases.

Common Risks

  • Unsafe content generation
  • harmful instructions
  • privacy violations
  • discriminatory output

  • Unauthorized actions

  • sending messages without approval
  • modifying records
  • triggering financial or operational actions

  • Policy violations

  • sharing confidential data
  • acting outside business rules
  • ignoring approval workflows

  • Lack of accountability

  • no audit trail
  • unclear decision path
  • no record of human review

Governance Goals

A well-governed AI system should be:

  • Policy-aware — it knows what is allowed, restricted, or prohibited
  • Traceable — decisions and actions can be logged and reviewed
  • Reviewable — risky tasks can be escalated to a human
  • Controlled — sensitive actions require explicit approval
  • Testable — governance behavior can be validated

2. Core Concepts: Policy, Guardrails, and Human Oversight

Policy

A policy is a rule or set of rules about what the system may or may not do.

Examples:

  • Never reveal API keys or secrets
  • Do not provide legal or medical advice as final guidance
  • Escalate requests involving customer financial changes
  • Require approval before sending outbound messages to external users

Policies may come from:

  • company rules
  • legal requirements
  • compliance teams
  • product design decisions
  • safety best practices

Guardrails

Guardrails are the technical mechanisms used to enforce policy.

Examples:

  • input filtering
  • output validation
  • action allowlists
  • tool restrictions
  • risk classification
  • confidence thresholds
  • escalation routing

Human Oversight

Human oversight means a person can:

  • review sensitive outputs
  • approve or reject actions
  • handle ambiguous requests
  • investigate policy flags
  • override automation when justified

Human oversight is especially important for:

  • external communications
  • high-impact decisions
  • financial actions
  • customer-sensitive operations
  • requests involving personal data
  • unclear or conflicting policy cases

3. A Practical Governance Pattern

A simple governance architecture for agentic systems:

  1. Receive user request
  2. Classify risk
  3. Check policies
  4. Decide route
  5. allow
  6. modify
  7. block
  8. escalate to human
  9. Log the decision
  10. If approved, perform action

Example Routing Outcomes

Risk Level Action
Low Allow automatically
Medium Allow with restrictions or warnings
High Escalate to human review
Prohibited Block

Example Sensitive Actions

  • sending an email
  • deleting data
  • issuing a refund
  • updating billing information
  • contacting an external customer
  • generating regulated advice

4. Designing Governance Rules for an Agent

For a Python-based agent, governance rules often cover three areas:

A. Content Rules

What the model is allowed to generate.

Examples:

  • no harmful instructions
  • no sensitive internal data disclosure
  • no fabricated compliance statements

B. Action Rules

What tools or actions the agent is allowed to use.

Examples:

  • draft email allowed
  • send email requires approval
  • database delete not allowed
  • customer refund over $100 requires human approval

C. Process Rules

How the system must behave before acting.

Examples:

  • log all decisions
  • require user confirmation for high-impact actions
  • store reason for escalation
  • provide an audit-friendly decision record

5. Theory Example: Governance Policy Table

Below is an example policy table for a support automation assistant.

Scenario Policy
General FAQ answer Allow
Draft reply to customer Allow
Send reply to customer Human approval required
Change billing info Human approval required
Refund over threshold Human approval required
Reveal internal credentials Block
Legal advice Escalate
Medical advice Escalate
Request involving personal secrets Block

This kind of table is a strong starting point because it makes governance explicit and testable.


6. Hands-On Exercise 1: Build a Policy Classifier with the Responses API

Goal

Create a Python program that uses gpt-5.4-mini to classify incoming requests into one of four governance outcomes:

  • allow
  • review
  • block
  • escalate

What You Will Learn

  • how to call the OpenAI Responses API from Python
  • how to make the model produce structured governance decisions
  • how to implement a simple policy-checking layer

Setup

Install the OpenAI SDK:

pip install openai

Set your API key:

export OPENAI_API_KEY="your_api_key_here"

Code

"""
exercise_1_policy_classifier.py

A simple governance classifier that uses OpenAI's Responses API
to classify user requests according to a small policy.

Model used: gpt-5.4-mini
"""

from openai import OpenAI
import json

# Create a reusable API client.
client = OpenAI()

# A compact governance policy that we want the model to follow.
POLICY_TEXT = """
You are a governance classifier for an internal AI assistant.

Classify the user's request into exactly one of these labels:
- allow: safe, low-risk request that can proceed automatically
- review: allowed only with human approval before any external or sensitive action
- escalate: ambiguous or high-risk domain request requiring expert or human handling
- block: prohibited request that must not be fulfilled

Policy rules:
1. Block requests for secrets, credentials, API keys, tokens, or confidential internal data.
2. Review any request that sends external communications or changes billing/refunds.
3. Escalate legal, medical, or ambiguous compliance-related requests.
4. Allow low-risk informational or drafting requests that do not execute actions.
5. If unsure between allow and a safer label, choose the safer label.

Return JSON with keys:
- label
- reason
"""

def classify_request(user_request: str) -> dict:
    """
    Classify a user request using the OpenAI Responses API.

    Args:
        user_request: The incoming request from a user.

    Returns:
        A dictionary with:
        - label: allow/review/escalate/block
        - reason: short explanation
    """
    response = client.responses.create(
        model="gpt-5.4-mini",
        input=[
            {
                "role": "system",
                "content": [
                    {"type": "input_text", "text": POLICY_TEXT}
                ],
            },
            {
                "role": "user",
                "content": [
                    {"type": "input_text", "text": f"User request: {user_request}"}
                ],
            },
        ],
    )

    # The SDK provides the combined text output via output_text.
    raw_text = response.output_text.strip()

    try:
        return json.loads(raw_text)
    except json.JSONDecodeError:
        # Fallback if the model output is not valid JSON.
        return {
            "label": "escalate",
            "reason": f"Could not parse model output safely: {raw_text}"
        }


if __name__ == "__main__":
    sample_requests = [
        "Summarize our refund policy for a customer support rep.",
        "Send this email to the customer confirming their refund.",
        "What is our production database password?",
        "Give me legal advice for handling a contract dispute."
    ]

    for request in sample_requests:
        result = classify_request(request)
        print("=" * 80)
        print(f"REQUEST: {request}")
        print("DECISION:", json.dumps(result, indent=2))

Example Output

================================================================================
REQUEST: Summarize our refund policy for a customer support rep.
DECISION: {
  "label": "allow",
  "reason": "This is a low-risk informational request and does not perform any external action."
}
================================================================================
REQUEST: Send this email to the customer confirming their refund.
DECISION: {
  "label": "review",
  "reason": "This request involves sending an external communication and should require human approval."
}
================================================================================
REQUEST: What is our production database password?
DECISION: {
  "label": "block",
  "reason": "The request asks for confidential credentials, which are prohibited."
}
================================================================================
REQUEST: Give me legal advice for handling a contract dispute.
DECISION: {
  "label": "escalate",
  "reason": "Legal advice requires expert or human handling under policy."
}

Exercise Tasks

  1. Run the script with the sample requests.
  2. Add 5 more requests and observe the classifications.
  3. Update the policy to include:
  4. HR-related requests
  5. PII exposure
  6. financial approvals
  7. Change the fallback behavior so invalid JSON is always treated as review instead of escalate.
  8. Discuss: which requests are difficult to classify consistently?

7. Hands-On Exercise 2: Add a Rule-Based Policy Gate Before Acting

Goal

Combine LLM classification with deterministic policy rules.

Why This Matters

In real systems, you should not rely only on model judgment. Some rules should be enforced directly in code.

Pattern

  • LLM helps interpret intent
  • deterministic code enforces hard constraints
  • risky actions require approval

Code

"""
exercise_2_policy_gate.py

A governance pipeline that combines:
1. deterministic hard-coded rules
2. LLM-based classification
3. action routing

Model used: gpt-5.4-mini
"""

from openai import OpenAI
import json

client = OpenAI()

POLICY_TEXT = """
You are a governance classifier.

Labels:
- allow
- review
- escalate
- block

Rules:
- Block secrets, passwords, credentials, private tokens, or confidential internal data.
- Review outbound customer communications, refunds, or billing changes.
- Escalate legal, medical, or unclear regulatory matters.
- Allow low-risk drafting or summarization tasks.
- Prefer safer labels if uncertain.

Return JSON with:
- label
- reason
"""

HARD_BLOCK_TERMS = [
    "api key",
    "password",
    "access token",
    "secret key",
    "private credential",
]

def hard_rule_check(user_request: str) -> dict | None:
    """
    Apply deterministic rules before calling the model.

    Returns:
        A policy result dict if a hard rule matches, otherwise None.
    """
    lowered = user_request.lower()

    for term in HARD_BLOCK_TERMS:
        if term in lowered:
            return {
                "label": "block",
                "reason": f"Matched hard-block term: '{term}'."
            }

    return None

def llm_classify(user_request: str) -> dict:
    """
    Use the Responses API to classify a request.
    """
    response = client.responses.create(
        model="gpt-5.4-mini",
        input=[
            {
                "role": "system",
                "content": [{"type": "input_text", "text": POLICY_TEXT}],
            },
            {
                "role": "user",
                "content": [{"type": "input_text", "text": user_request}],
            },
        ],
    )

    raw_text = response.output_text.strip()

    try:
        result = json.loads(raw_text)
    except json.JSONDecodeError:
        result = {
            "label": "review",
            "reason": f"Non-JSON output; defaulting to review. Raw output: {raw_text}"
        }

    # Normalize label just in case.
    label = result.get("label", "").strip().lower()
    if label not in {"allow", "review", "escalate", "block"}:
        result["label"] = "review"
        result["reason"] = "Unexpected label from model; defaulted to review."

    return result

def route_decision(user_request: str) -> dict:
    """
    Route the request through hard rules, then LLM policy classification.
    """
    hard_result = hard_rule_check(user_request)
    if hard_result:
        return {
            "request": user_request,
            "source": "hard_rule",
            **hard_result
        }

    llm_result = llm_classify(user_request)
    return {
        "request": user_request,
        "source": "llm_policy",
        **llm_result
    }

if __name__ == "__main__":
    requests = [
        "Draft a polite reply to the customer about shipping delays.",
        "Send this reply to the customer now.",
        "Please share the admin password for the production dashboard.",
        "Help me decide what to say in a legal dispute with a supplier."
    ]

    for item in requests:
        decision = route_decision(item)
        print(json.dumps(decision, indent=2))

Example Output

{
  "request": "Draft a polite reply to the customer about shipping delays.",
  "source": "llm_policy",
  "label": "allow",
  "reason": "This is a low-risk drafting task without direct action."
}
{
  "request": "Send this reply to the customer now.",
  "source": "llm_policy",
  "label": "review",
  "reason": "Outbound customer communication requires human approval."
}
{
  "request": "Please share the admin password for the production dashboard.",
  "source": "hard_rule",
  "label": "block",
  "reason": "Matched hard-block term: 'password'."
}
{
  "request": "Help me decide what to say in a legal dispute with a supplier.",
  "source": "llm_policy",
  "label": "escalate",
  "reason": "This is a legal matter that requires expert or human review."
}

Exercise Tasks

  1. Add more hard-block terms.
  2. Add a hard-review rule for terms like:
  3. refund
  4. billing update
  5. customer email
  6. Add unit-test-like checks by creating a list of expected labels.
  7. Compare cases where:
  8. code rules decide first
  9. LLM decides first

8. Hands-On Exercise 3: Human-in-the-Loop Approval Flow

Goal

Build a simple approval system where risky requests are not executed automatically.

Scenario

An agent can draft customer emails, but sending them requires human approval.

Governance Design

  • allow → proceed automatically
  • review → create approval task
  • escalate → route to specialist/human
  • block → reject

Code

"""
exercise_3_human_oversight.py

A simple human-in-the-loop workflow for risky AI actions.

The system:
1. classifies a request
2. logs the decision
3. drafts content if appropriate
4. requires human approval for sensitive actions

Model used: gpt-5.4-mini
"""

from openai import OpenAI
import json
from datetime import datetime

client = OpenAI()

CLASSIFIER_POLICY = """
You are a governance classifier.

Classify into:
- allow
- review
- escalate
- block

Rules:
- Allow low-risk drafting, summarization, and internal informational requests.
- Review external communication, refunds, billing changes, or customer-facing actions.
- Escalate legal, medical, or unclear compliance matters.
- Block secrets, credentials, and confidential data requests.

Return JSON with:
- label
- reason
"""

def classify_request(user_request: str) -> dict:
    """
    Classify a request with the Responses API.
    """
    response = client.responses.create(
        model="gpt-5.4-mini",
        input=[
            {
                "role": "system",
                "content": [{"type": "input_text", "text": CLASSIFIER_POLICY}],
            },
            {
                "role": "user",
                "content": [{"type": "input_text", "text": user_request}],
            },
        ],
    )

    try:
        return json.loads(response.output_text)
    except json.JSONDecodeError:
        return {
            "label": "review",
            "reason": "Parsing failed; defaulting to human review."
        }

def draft_customer_email(topic: str) -> str:
    """
    Ask the model to draft a professional customer email.
    """
    prompt = f"""
Draft a short, professional customer support email about this topic:
{topic}

Do not claim actions were completed unless explicitly stated.
Keep the tone clear and polite.
"""

    response = client.responses.create(
        model="gpt-5.4-mini",
        input=prompt,
    )
    return response.output_text.strip()

def create_audit_log(entry: dict) -> None:
    """
    Print an audit log entry.
    In a real system, write this to a database or log sink.
    """
    print("\n[AUDIT LOG]")
    print(json.dumps(entry, indent=2))

def create_review_task(user_request: str, draft: str, reason: str) -> dict:
    """
    Build a review task for a human approver.
    """
    return {
        "task_type": "human_approval_required",
        "created_at": datetime.utcnow().isoformat() + "Z",
        "request": user_request,
        "draft": draft,
        "reason": reason,
        "status": "pending"
    }

def process_request(user_request: str) -> dict:
    """
    Process a request through governance controls.
    """
    decision = classify_request(user_request)
    label = decision["label"]
    reason = decision["reason"]

    audit_entry = {
        "timestamp": datetime.utcnow().isoformat() + "Z",
        "request": user_request,
        "decision": label,
        "reason": reason,
    }

    if label == "block":
        audit_entry["outcome"] = "rejected"
        create_audit_log(audit_entry)
        return {
            "status": "rejected",
            "message": "Request blocked by policy.",
            "reason": reason,
        }

    if label == "escalate":
        audit_entry["outcome"] = "escalated_to_specialist"
        create_audit_log(audit_entry)
        return {
            "status": "escalated",
            "message": "Request requires specialist or human handling.",
            "reason": reason,
        }

    if label == "review":
        draft = draft_customer_email(user_request)
        review_task = create_review_task(user_request, draft, reason)
        audit_entry["outcome"] = "queued_for_human_review"
        create_audit_log(audit_entry)
        return {
            "status": "pending_review",
            "review_task": review_task
        }

    # label == "allow"
    draft = draft_customer_email(user_request)
    audit_entry["outcome"] = "completed_automatically"
    create_audit_log(audit_entry)
    return {
        "status": "completed",
        "draft": draft
    }

if __name__ == "__main__":
    requests = [
        "Draft an email to a customer explaining their order is delayed.",
        "Send an apology email to the customer confirming their refund has been issued.",
        "What is the finance team's internal admin password?",
        "Write legal advice for responding to a supplier contract issue."
    ]

    for req in requests:
        print("\n" + "=" * 80)
        print(f"REQUEST: {req}")
        result = process_request(req)
        print("[RESULT]")
        print(json.dumps(result, indent=2))

Example Output

================================================================================
REQUEST: Draft an email to a customer explaining their order is delayed.

[AUDIT LOG]
{
  "timestamp": "2026-03-22T12:00:00Z",
  "request": "Draft an email to a customer explaining their order is delayed.",
  "decision": "allow",
  "reason": "This is a low-risk drafting request.",
  "outcome": "completed_automatically"
}
[RESULT]
{
  "status": "completed",
  "draft": "Subject: Update on Your Order\n\nHello,\n\nWe wanted to let you know that your order is taking longer than expected to arrive. We apologize for the delay and appreciate your patience.\n\nBest regards,\nCustomer Support"
}
================================================================================
REQUEST: Send an apology email to the customer confirming their refund has been issued.

[AUDIT LOG]
{
  "timestamp": "2026-03-22T12:00:10Z",
  "request": "Send an apology email to the customer confirming their refund has been issued.",
  "decision": "review",
  "reason": "This involves external customer communication and refund-related messaging.",
  "outcome": "queued_for_human_review"
}
[RESULT]
{
  "status": "pending_review",
  "review_task": {
    "task_type": "human_approval_required",
    "created_at": "2026-03-22T12:00:10Z",
    "request": "Send an apology email to the customer confirming their refund has been issued.",
    "draft": "Subject: Refund Confirmation\n\nHello,\n\nWe are sorry for the inconvenience. Your refund has been issued. Please let us know if you have any additional questions.\n\nBest regards,\nCustomer Support",
    "reason": "This involves external customer communication and refund-related messaging.",
    "status": "pending"
  }
}

Exercise Tasks

  1. Add a console prompt for human approval:
  2. approve
  3. reject
  4. If approved, simulate sending the email.
  5. Store review tasks in a list or JSON file.
  6. Add reviewer_name and review_timestamp.
  7. Extend the workflow so escalate routes to a different queue than review.

9. Discussion: What Should Always Require Human Review?

Use this section as a group reflection or personal design checklist.

Strong Candidates for Mandatory Review

  • financial transactions
  • legal guidance
  • medical guidance
  • data deletion
  • external communications
  • personnel actions
  • changes to customer accounts
  • anything involving identity verification
  • disclosure of sensitive internal information
  • actions with irreversible impact

Questions to Ask

  • What is the harm if this is wrong?
  • Is the action reversible?
  • Does it affect an external user?
  • Does it involve regulated content?
  • Should there be an audit trail?
  • Would you be comfortable explaining the decision to an auditor?

10. Best Practices for Governance in GenAI Applications

1. Separate Policy from Prompting

Do not bury all governance logic inside one prompt. Keep policy visible in code or configuration.

2. Use Defense in Depth

Combine:

  • prompts
  • deterministic code rules
  • approval workflows
  • logs
  • monitoring

3. Default to Safer Outcomes

If parsing fails, confidence is low, or the request is ambiguous:

  • review
  • escalate
  • block

Avoid silently proceeding.

4. Log Important Decisions

Capture:

  • request
  • classification
  • reason
  • action taken
  • timestamp
  • reviewer identity when applicable

5. Restrict Actions, Not Just Text

An agent that can act needs strong action controls.

Examples:

  • allow draft, restrict send
  • allow read, restrict delete
  • allow suggestion, restrict execution

6. Test Policy Behavior

Build test cases for:

  • allowed requests
  • blocked requests
  • review-required requests
  • borderline ambiguous requests

7. Keep a Human Escape Hatch

Humans should be able to:

  • intervene
  • approve
  • reject
  • correct
  • disable unsafe automation

11. Mini Challenge

Challenge Prompt

Design a governance workflow for an internal HR assistant that can:

  • answer policy questions
  • draft employee emails
  • summarize handbook documents
  • update employee records

Your Task

Create a simple table with:

  • request type
  • risk level
  • policy outcome
  • whether human approval is required

Suggested Questions

  • Which actions should be blocked?
  • Which actions should be review-only?
  • Which requests involve sensitive personal data?
  • How should audit logging work?

12. Recap

In this session, you learned that governance in AI systems is not optional, especially for agentic workflows.

You covered:

  • why governance matters
  • the difference between policy, guardrails, and human oversight
  • how to classify requests using gpt-5.4-mini
  • how to add deterministic policy gates
  • how to create a basic human-in-the-loop review flow
  • how to log and route sensitive actions safely

The key idea is simple:

Not every AI-generated result should become an action automatically.
Good systems decide when to allow, review, escalate, or block.


Useful Resources

  • OpenAI Responses API migration guide: https://developers.openai.com/api/docs/guides/migrate-to-responses
  • OpenAI API docs: https://developers.openai.com/api/docs/
  • OpenAI Python SDK: https://github.com/openai/openai-python
  • OpenAI developer resources: https://developers.openai.com/resources/
  • OWASP LLM Top 10: https://owasp.org/www-project-top-10-for-large-language-model-applications/
  • NIST AI Risk Management Framework: https://www.nist.gov/itl/ai-risk-management-framework

Suggested Homework

  1. Extend Exercise 3 into a reusable governance module.
  2. Add a JSON-based policy file instead of hard-coded policy text.
  3. Write 10 test cases covering:
  4. allow
  5. review
  6. escalate
  7. block
  8. Add a simulated reviewer dashboard in the terminal.
  9. Reflect on one real-world workflow in your company that should never be fully automated without human approval.

Back to Chapter | Back to Master Plan | Previous Session | Next Session