Session 3: Governance, Policy, and Human Oversight
Synopsis
Introduces governance frameworks, review processes, escalation paths, and human approval mechanisms for high-impact use cases. Learners study how organizations keep agentic systems aligned with legal and ethical obligations.
Session Content
Session 3: Governance, Policy, and Human Oversight
Session Overview
Duration: ~45 minutes
Audience: Python developers with basic programming knowledge, learning GenAI and agentic development
Focus: How to build AI systems that are compliant, reviewable, and safe through governance rules, policy checks, and human-in-the-loop escalation
Learning Objectives
By the end of this session, learners will be able to:
- Explain why governance is essential in GenAI and agentic systems
- Distinguish between policy, guardrails, and human oversight
- Identify common governance risks such as unsafe output, unauthorized actions, and poor traceability
- Implement a simple policy enforcement layer in Python
- Use the OpenAI Responses API with
gpt-5.4-minito classify requests and route risky cases for human review - Build a lightweight human-in-the-loop approval flow for sensitive actions
1. Why Governance Matters in Agentic Systems
Modern GenAI systems do more than generate text. They can:
- summarize documents
- draft emails
- retrieve internal knowledge
- call tools
- take actions on behalf of users
- chain multiple steps automatically
As systems become more agentic, the risk profile increases.
Common Risks
- Unsafe content generation
- harmful instructions
- privacy violations
-
discriminatory output
-
Unauthorized actions
- sending messages without approval
- modifying records
-
triggering financial or operational actions
-
Policy violations
- sharing confidential data
- acting outside business rules
-
ignoring approval workflows
-
Lack of accountability
- no audit trail
- unclear decision path
- no record of human review
Governance Goals
A well-governed AI system should be:
- Policy-aware — it knows what is allowed, restricted, or prohibited
- Traceable — decisions and actions can be logged and reviewed
- Reviewable — risky tasks can be escalated to a human
- Controlled — sensitive actions require explicit approval
- Testable — governance behavior can be validated
2. Core Concepts: Policy, Guardrails, and Human Oversight
Policy
A policy is a rule or set of rules about what the system may or may not do.
Examples:
- Never reveal API keys or secrets
- Do not provide legal or medical advice as final guidance
- Escalate requests involving customer financial changes
- Require approval before sending outbound messages to external users
Policies may come from:
- company rules
- legal requirements
- compliance teams
- product design decisions
- safety best practices
Guardrails
Guardrails are the technical mechanisms used to enforce policy.
Examples:
- input filtering
- output validation
- action allowlists
- tool restrictions
- risk classification
- confidence thresholds
- escalation routing
Human Oversight
Human oversight means a person can:
- review sensitive outputs
- approve or reject actions
- handle ambiguous requests
- investigate policy flags
- override automation when justified
Human oversight is especially important for:
- external communications
- high-impact decisions
- financial actions
- customer-sensitive operations
- requests involving personal data
- unclear or conflicting policy cases
3. A Practical Governance Pattern
A simple governance architecture for agentic systems:
- Receive user request
- Classify risk
- Check policies
- Decide route
- allow
- modify
- block
- escalate to human
- Log the decision
- If approved, perform action
Example Routing Outcomes
| Risk Level | Action |
|---|---|
| Low | Allow automatically |
| Medium | Allow with restrictions or warnings |
| High | Escalate to human review |
| Prohibited | Block |
Example Sensitive Actions
- sending an email
- deleting data
- issuing a refund
- updating billing information
- contacting an external customer
- generating regulated advice
4. Designing Governance Rules for an Agent
For a Python-based agent, governance rules often cover three areas:
A. Content Rules
What the model is allowed to generate.
Examples:
- no harmful instructions
- no sensitive internal data disclosure
- no fabricated compliance statements
B. Action Rules
What tools or actions the agent is allowed to use.
Examples:
- draft email allowed
- send email requires approval
- database delete not allowed
- customer refund over $100 requires human approval
C. Process Rules
How the system must behave before acting.
Examples:
- log all decisions
- require user confirmation for high-impact actions
- store reason for escalation
- provide an audit-friendly decision record
5. Theory Example: Governance Policy Table
Below is an example policy table for a support automation assistant.
| Scenario | Policy |
|---|---|
| General FAQ answer | Allow |
| Draft reply to customer | Allow |
| Send reply to customer | Human approval required |
| Change billing info | Human approval required |
| Refund over threshold | Human approval required |
| Reveal internal credentials | Block |
| Legal advice | Escalate |
| Medical advice | Escalate |
| Request involving personal secrets | Block |
This kind of table is a strong starting point because it makes governance explicit and testable.
6. Hands-On Exercise 1: Build a Policy Classifier with the Responses API
Goal
Create a Python program that uses gpt-5.4-mini to classify incoming requests into one of four governance outcomes:
allowreviewblockescalate
What You Will Learn
- how to call the OpenAI Responses API from Python
- how to make the model produce structured governance decisions
- how to implement a simple policy-checking layer
Setup
Install the OpenAI SDK:
pip install openai
Set your API key:
export OPENAI_API_KEY="your_api_key_here"
Code
"""
exercise_1_policy_classifier.py
A simple governance classifier that uses OpenAI's Responses API
to classify user requests according to a small policy.
Model used: gpt-5.4-mini
"""
from openai import OpenAI
import json
# Create a reusable API client.
client = OpenAI()
# A compact governance policy that we want the model to follow.
POLICY_TEXT = """
You are a governance classifier for an internal AI assistant.
Classify the user's request into exactly one of these labels:
- allow: safe, low-risk request that can proceed automatically
- review: allowed only with human approval before any external or sensitive action
- escalate: ambiguous or high-risk domain request requiring expert or human handling
- block: prohibited request that must not be fulfilled
Policy rules:
1. Block requests for secrets, credentials, API keys, tokens, or confidential internal data.
2. Review any request that sends external communications or changes billing/refunds.
3. Escalate legal, medical, or ambiguous compliance-related requests.
4. Allow low-risk informational or drafting requests that do not execute actions.
5. If unsure between allow and a safer label, choose the safer label.
Return JSON with keys:
- label
- reason
"""
def classify_request(user_request: str) -> dict:
"""
Classify a user request using the OpenAI Responses API.
Args:
user_request: The incoming request from a user.
Returns:
A dictionary with:
- label: allow/review/escalate/block
- reason: short explanation
"""
response = client.responses.create(
model="gpt-5.4-mini",
input=[
{
"role": "system",
"content": [
{"type": "input_text", "text": POLICY_TEXT}
],
},
{
"role": "user",
"content": [
{"type": "input_text", "text": f"User request: {user_request}"}
],
},
],
)
# The SDK provides the combined text output via output_text.
raw_text = response.output_text.strip()
try:
return json.loads(raw_text)
except json.JSONDecodeError:
# Fallback if the model output is not valid JSON.
return {
"label": "escalate",
"reason": f"Could not parse model output safely: {raw_text}"
}
if __name__ == "__main__":
sample_requests = [
"Summarize our refund policy for a customer support rep.",
"Send this email to the customer confirming their refund.",
"What is our production database password?",
"Give me legal advice for handling a contract dispute."
]
for request in sample_requests:
result = classify_request(request)
print("=" * 80)
print(f"REQUEST: {request}")
print("DECISION:", json.dumps(result, indent=2))
Example Output
================================================================================
REQUEST: Summarize our refund policy for a customer support rep.
DECISION: {
"label": "allow",
"reason": "This is a low-risk informational request and does not perform any external action."
}
================================================================================
REQUEST: Send this email to the customer confirming their refund.
DECISION: {
"label": "review",
"reason": "This request involves sending an external communication and should require human approval."
}
================================================================================
REQUEST: What is our production database password?
DECISION: {
"label": "block",
"reason": "The request asks for confidential credentials, which are prohibited."
}
================================================================================
REQUEST: Give me legal advice for handling a contract dispute.
DECISION: {
"label": "escalate",
"reason": "Legal advice requires expert or human handling under policy."
}
Exercise Tasks
- Run the script with the sample requests.
- Add 5 more requests and observe the classifications.
- Update the policy to include:
- HR-related requests
- PII exposure
- financial approvals
- Change the fallback behavior so invalid JSON is always treated as
reviewinstead ofescalate. - Discuss: which requests are difficult to classify consistently?
7. Hands-On Exercise 2: Add a Rule-Based Policy Gate Before Acting
Goal
Combine LLM classification with deterministic policy rules.
Why This Matters
In real systems, you should not rely only on model judgment. Some rules should be enforced directly in code.
Pattern
- LLM helps interpret intent
- deterministic code enforces hard constraints
- risky actions require approval
Code
"""
exercise_2_policy_gate.py
A governance pipeline that combines:
1. deterministic hard-coded rules
2. LLM-based classification
3. action routing
Model used: gpt-5.4-mini
"""
from openai import OpenAI
import json
client = OpenAI()
POLICY_TEXT = """
You are a governance classifier.
Labels:
- allow
- review
- escalate
- block
Rules:
- Block secrets, passwords, credentials, private tokens, or confidential internal data.
- Review outbound customer communications, refunds, or billing changes.
- Escalate legal, medical, or unclear regulatory matters.
- Allow low-risk drafting or summarization tasks.
- Prefer safer labels if uncertain.
Return JSON with:
- label
- reason
"""
HARD_BLOCK_TERMS = [
"api key",
"password",
"access token",
"secret key",
"private credential",
]
def hard_rule_check(user_request: str) -> dict | None:
"""
Apply deterministic rules before calling the model.
Returns:
A policy result dict if a hard rule matches, otherwise None.
"""
lowered = user_request.lower()
for term in HARD_BLOCK_TERMS:
if term in lowered:
return {
"label": "block",
"reason": f"Matched hard-block term: '{term}'."
}
return None
def llm_classify(user_request: str) -> dict:
"""
Use the Responses API to classify a request.
"""
response = client.responses.create(
model="gpt-5.4-mini",
input=[
{
"role": "system",
"content": [{"type": "input_text", "text": POLICY_TEXT}],
},
{
"role": "user",
"content": [{"type": "input_text", "text": user_request}],
},
],
)
raw_text = response.output_text.strip()
try:
result = json.loads(raw_text)
except json.JSONDecodeError:
result = {
"label": "review",
"reason": f"Non-JSON output; defaulting to review. Raw output: {raw_text}"
}
# Normalize label just in case.
label = result.get("label", "").strip().lower()
if label not in {"allow", "review", "escalate", "block"}:
result["label"] = "review"
result["reason"] = "Unexpected label from model; defaulted to review."
return result
def route_decision(user_request: str) -> dict:
"""
Route the request through hard rules, then LLM policy classification.
"""
hard_result = hard_rule_check(user_request)
if hard_result:
return {
"request": user_request,
"source": "hard_rule",
**hard_result
}
llm_result = llm_classify(user_request)
return {
"request": user_request,
"source": "llm_policy",
**llm_result
}
if __name__ == "__main__":
requests = [
"Draft a polite reply to the customer about shipping delays.",
"Send this reply to the customer now.",
"Please share the admin password for the production dashboard.",
"Help me decide what to say in a legal dispute with a supplier."
]
for item in requests:
decision = route_decision(item)
print(json.dumps(decision, indent=2))
Example Output
{
"request": "Draft a polite reply to the customer about shipping delays.",
"source": "llm_policy",
"label": "allow",
"reason": "This is a low-risk drafting task without direct action."
}
{
"request": "Send this reply to the customer now.",
"source": "llm_policy",
"label": "review",
"reason": "Outbound customer communication requires human approval."
}
{
"request": "Please share the admin password for the production dashboard.",
"source": "hard_rule",
"label": "block",
"reason": "Matched hard-block term: 'password'."
}
{
"request": "Help me decide what to say in a legal dispute with a supplier.",
"source": "llm_policy",
"label": "escalate",
"reason": "This is a legal matter that requires expert or human review."
}
Exercise Tasks
- Add more hard-block terms.
- Add a hard-review rule for terms like:
- refund
- billing update
- customer email
- Add unit-test-like checks by creating a list of expected labels.
- Compare cases where:
- code rules decide first
- LLM decides first
8. Hands-On Exercise 3: Human-in-the-Loop Approval Flow
Goal
Build a simple approval system where risky requests are not executed automatically.
Scenario
An agent can draft customer emails, but sending them requires human approval.
Governance Design
allow→ proceed automaticallyreview→ create approval taskescalate→ route to specialist/humanblock→ reject
Code
"""
exercise_3_human_oversight.py
A simple human-in-the-loop workflow for risky AI actions.
The system:
1. classifies a request
2. logs the decision
3. drafts content if appropriate
4. requires human approval for sensitive actions
Model used: gpt-5.4-mini
"""
from openai import OpenAI
import json
from datetime import datetime
client = OpenAI()
CLASSIFIER_POLICY = """
You are a governance classifier.
Classify into:
- allow
- review
- escalate
- block
Rules:
- Allow low-risk drafting, summarization, and internal informational requests.
- Review external communication, refunds, billing changes, or customer-facing actions.
- Escalate legal, medical, or unclear compliance matters.
- Block secrets, credentials, and confidential data requests.
Return JSON with:
- label
- reason
"""
def classify_request(user_request: str) -> dict:
"""
Classify a request with the Responses API.
"""
response = client.responses.create(
model="gpt-5.4-mini",
input=[
{
"role": "system",
"content": [{"type": "input_text", "text": CLASSIFIER_POLICY}],
},
{
"role": "user",
"content": [{"type": "input_text", "text": user_request}],
},
],
)
try:
return json.loads(response.output_text)
except json.JSONDecodeError:
return {
"label": "review",
"reason": "Parsing failed; defaulting to human review."
}
def draft_customer_email(topic: str) -> str:
"""
Ask the model to draft a professional customer email.
"""
prompt = f"""
Draft a short, professional customer support email about this topic:
{topic}
Do not claim actions were completed unless explicitly stated.
Keep the tone clear and polite.
"""
response = client.responses.create(
model="gpt-5.4-mini",
input=prompt,
)
return response.output_text.strip()
def create_audit_log(entry: dict) -> None:
"""
Print an audit log entry.
In a real system, write this to a database or log sink.
"""
print("\n[AUDIT LOG]")
print(json.dumps(entry, indent=2))
def create_review_task(user_request: str, draft: str, reason: str) -> dict:
"""
Build a review task for a human approver.
"""
return {
"task_type": "human_approval_required",
"created_at": datetime.utcnow().isoformat() + "Z",
"request": user_request,
"draft": draft,
"reason": reason,
"status": "pending"
}
def process_request(user_request: str) -> dict:
"""
Process a request through governance controls.
"""
decision = classify_request(user_request)
label = decision["label"]
reason = decision["reason"]
audit_entry = {
"timestamp": datetime.utcnow().isoformat() + "Z",
"request": user_request,
"decision": label,
"reason": reason,
}
if label == "block":
audit_entry["outcome"] = "rejected"
create_audit_log(audit_entry)
return {
"status": "rejected",
"message": "Request blocked by policy.",
"reason": reason,
}
if label == "escalate":
audit_entry["outcome"] = "escalated_to_specialist"
create_audit_log(audit_entry)
return {
"status": "escalated",
"message": "Request requires specialist or human handling.",
"reason": reason,
}
if label == "review":
draft = draft_customer_email(user_request)
review_task = create_review_task(user_request, draft, reason)
audit_entry["outcome"] = "queued_for_human_review"
create_audit_log(audit_entry)
return {
"status": "pending_review",
"review_task": review_task
}
# label == "allow"
draft = draft_customer_email(user_request)
audit_entry["outcome"] = "completed_automatically"
create_audit_log(audit_entry)
return {
"status": "completed",
"draft": draft
}
if __name__ == "__main__":
requests = [
"Draft an email to a customer explaining their order is delayed.",
"Send an apology email to the customer confirming their refund has been issued.",
"What is the finance team's internal admin password?",
"Write legal advice for responding to a supplier contract issue."
]
for req in requests:
print("\n" + "=" * 80)
print(f"REQUEST: {req}")
result = process_request(req)
print("[RESULT]")
print(json.dumps(result, indent=2))
Example Output
================================================================================
REQUEST: Draft an email to a customer explaining their order is delayed.
[AUDIT LOG]
{
"timestamp": "2026-03-22T12:00:00Z",
"request": "Draft an email to a customer explaining their order is delayed.",
"decision": "allow",
"reason": "This is a low-risk drafting request.",
"outcome": "completed_automatically"
}
[RESULT]
{
"status": "completed",
"draft": "Subject: Update on Your Order\n\nHello,\n\nWe wanted to let you know that your order is taking longer than expected to arrive. We apologize for the delay and appreciate your patience.\n\nBest regards,\nCustomer Support"
}
================================================================================
REQUEST: Send an apology email to the customer confirming their refund has been issued.
[AUDIT LOG]
{
"timestamp": "2026-03-22T12:00:10Z",
"request": "Send an apology email to the customer confirming their refund has been issued.",
"decision": "review",
"reason": "This involves external customer communication and refund-related messaging.",
"outcome": "queued_for_human_review"
}
[RESULT]
{
"status": "pending_review",
"review_task": {
"task_type": "human_approval_required",
"created_at": "2026-03-22T12:00:10Z",
"request": "Send an apology email to the customer confirming their refund has been issued.",
"draft": "Subject: Refund Confirmation\n\nHello,\n\nWe are sorry for the inconvenience. Your refund has been issued. Please let us know if you have any additional questions.\n\nBest regards,\nCustomer Support",
"reason": "This involves external customer communication and refund-related messaging.",
"status": "pending"
}
}
Exercise Tasks
- Add a console prompt for human approval:
- approve
- reject
- If approved, simulate sending the email.
- Store review tasks in a list or JSON file.
- Add
reviewer_nameandreview_timestamp. - Extend the workflow so
escalateroutes to a different queue thanreview.
9. Discussion: What Should Always Require Human Review?
Use this section as a group reflection or personal design checklist.
Strong Candidates for Mandatory Review
- financial transactions
- legal guidance
- medical guidance
- data deletion
- external communications
- personnel actions
- changes to customer accounts
- anything involving identity verification
- disclosure of sensitive internal information
- actions with irreversible impact
Questions to Ask
- What is the harm if this is wrong?
- Is the action reversible?
- Does it affect an external user?
- Does it involve regulated content?
- Should there be an audit trail?
- Would you be comfortable explaining the decision to an auditor?
10. Best Practices for Governance in GenAI Applications
1. Separate Policy from Prompting
Do not bury all governance logic inside one prompt. Keep policy visible in code or configuration.
2. Use Defense in Depth
Combine:
- prompts
- deterministic code rules
- approval workflows
- logs
- monitoring
3. Default to Safer Outcomes
If parsing fails, confidence is low, or the request is ambiguous:
- review
- escalate
- block
Avoid silently proceeding.
4. Log Important Decisions
Capture:
- request
- classification
- reason
- action taken
- timestamp
- reviewer identity when applicable
5. Restrict Actions, Not Just Text
An agent that can act needs strong action controls.
Examples:
- allow draft, restrict send
- allow read, restrict delete
- allow suggestion, restrict execution
6. Test Policy Behavior
Build test cases for:
- allowed requests
- blocked requests
- review-required requests
- borderline ambiguous requests
7. Keep a Human Escape Hatch
Humans should be able to:
- intervene
- approve
- reject
- correct
- disable unsafe automation
11. Mini Challenge
Challenge Prompt
Design a governance workflow for an internal HR assistant that can:
- answer policy questions
- draft employee emails
- summarize handbook documents
- update employee records
Your Task
Create a simple table with:
- request type
- risk level
- policy outcome
- whether human approval is required
Suggested Questions
- Which actions should be blocked?
- Which actions should be review-only?
- Which requests involve sensitive personal data?
- How should audit logging work?
12. Recap
In this session, you learned that governance in AI systems is not optional, especially for agentic workflows.
You covered:
- why governance matters
- the difference between policy, guardrails, and human oversight
- how to classify requests using
gpt-5.4-mini - how to add deterministic policy gates
- how to create a basic human-in-the-loop review flow
- how to log and route sensitive actions safely
The key idea is simple:
Not every AI-generated result should become an action automatically.
Good systems decide when to allow, review, escalate, or block.
Useful Resources
- OpenAI Responses API migration guide: https://developers.openai.com/api/docs/guides/migrate-to-responses
- OpenAI API docs: https://developers.openai.com/api/docs/
- OpenAI Python SDK: https://github.com/openai/openai-python
- OpenAI developer resources: https://developers.openai.com/resources/
- OWASP LLM Top 10: https://owasp.org/www-project-top-10-for-large-language-model-applications/
- NIST AI Risk Management Framework: https://www.nist.gov/itl/ai-risk-management-framework
Suggested Homework
- Extend Exercise 3 into a reusable governance module.
- Add a JSON-based policy file instead of hard-coded policy text.
- Write 10 test cases covering:
- allow
- review
- escalate
- block
- Add a simulated reviewer dashboard in the terminal.
- Reflect on one real-world workflow in your company that should never be fully automated without human approval.
Back to Chapter | Back to Master Plan | Previous Session | Next Session