Session 3: Conflict Resolution and Coordination Strategies
Synopsis
Introduces methods for reconciling competing outputs, selecting between proposals, and coordinating sequential or parallel work. Learners see how governance and arbitration become essential in multi-agent environments.
Session Content
Session 3: Conflict Resolution and Coordination Strategies
Session Overview
Duration: ~45 minutes
Audience: Python developers with basic programming knowledge, learning GenAI and agentic development
Focus: Understanding how multiple agents coordinate, how conflicts emerge, and how to implement practical conflict resolution strategies using the OpenAI Responses API and gpt-5.4-mini
Learning Objectives
By the end of this session, learners will be able to:
- Explain why conflicts occur in multi-agent and agentic systems
- Identify common coordination failures such as contradiction, duplication, deadlock, and goal misalignment
- Apply conflict resolution strategies including prioritization, voting, arbitration, and rule-based reconciliation
- Implement a simple Python-based coordinator that compares agent outputs and resolves disagreements
- Build a practical orchestration workflow using the OpenAI Responses API
1. Why Conflict Resolution Matters in Agentic Systems
As soon as multiple agents collaborate, disagreement becomes normal rather than exceptional. In an agentic workflow, one agent may generate a plan, another may review it, and a third may optimize for time, cost, or safety. These agents can produce incompatible recommendations.
Common Sources of Conflict
- Different objectives
- One agent optimizes for speed
- Another optimizes for quality
-
Another optimizes for compliance or safety
-
Different context windows
- Agents may see different subsets of information
-
One agent may miss a critical constraint
-
Prompt framing differences
-
Slight differences in instructions can lead to contradictory outputs
-
Ambiguity in task ownership
-
Multiple agents may solve the same subproblem in incompatible ways
-
Stale state
- An agent may reason over outdated information while others use updated state
Examples of Coordination Failures
- Contradiction: Agent A says “approve the refund,” Agent B says “deny the refund”
- Duplication: Two agents perform the same task unnecessarily
- Deadlock: Each agent waits for another to decide
- Priority inversion: A low-priority optimization overrides a critical safety rule
- Hallucinated consensus: A coordinator assumes agreement where none exists
2. Coordination Patterns in Multi-Agent Systems
Before resolving conflicts, it helps to understand common coordination patterns.
A. Centralized Coordinator
A single orchestrator collects outputs from agents and makes a final decision.
Pros - Easier to debug - Clear decision authority - Good for production workflows
Cons - Single point of failure - Coordinator quality strongly affects system performance
B. Peer-to-Peer Negotiation
Agents communicate directly and attempt to reconcile differences.
Pros - Flexible - Closer to distributed systems patterns
Cons - Harder to control - Can become expensive or unstable
C. Hierarchical Delegation
A parent agent delegates to specialized child agents, then integrates results.
Pros - Natural task decomposition - Clear responsibility boundaries
Cons - Requires good task design - Parent may still face conflicting recommendations
D. Voting or Ensemble Decision
Several agents independently solve a problem and the system chooses a majority or weighted result.
Pros - Useful for robustness - Reduces dependence on one output
Cons - Majority can still be wrong - Hard to apply when outputs are open-ended
3. Practical Conflict Resolution Strategies
3.1 Rule-Based Prioritization
Use explicit rules to determine which output wins.
Examples: - Safety overrides cost - Compliance overrides convenience - User instruction overrides stylistic preferences
This is often the most practical strategy in production.
3.2 Scoring and Ranking
Assign each proposal a score across dimensions such as: - correctness - cost - latency - safety - alignment with user intent
Then choose the highest total score or the best constrained option.
3.3 Arbitration Agent
Use a separate “arbiter” agent to compare competing proposals and choose one.
This is useful when the conflict requires reasoning rather than static rules.
3.4 Voting
Applicable when multiple agents produce candidate answers in similar formats.
Common methods: - simple majority - weighted majority - confidence-based voting
3.5 Merge-and-Rewrite
Instead of choosing one output, synthesize a unified solution.
Best when: - each proposal is partially correct - tradeoffs can be balanced - a final polished result is needed
3.6 Escalation
If conflict cannot be resolved confidently: - ask a human - request more information - re-run with tighter prompts - trigger a fallback policy
4. Designing a Coordinator
A practical coordinator usually performs these steps:
- Collect outputs from specialized agents
- Normalize them into a consistent structure
- Detect disagreement
- Apply decision policy
- Produce final output and decision rationale
- Log the process for observability
Recommended Output Structure for Agents
To coordinate effectively, ask each agent to return structured fields such as:
recommendationreasoning_summarypriorityrisksconfidence
This makes comparison easier than using free-form text.
5. Hands-On Exercise 1: Compare Two Agents with Different Priorities
Goal
Create two specialized agents: - a speed-focused agent - a quality-focused agent
Then compare their recommendations for the same task.
What You Will Learn
- How different prompts create divergent recommendations
- How to call the OpenAI Responses API using Python
- How to inspect outputs before reconciliation
Setup
Install dependencies:
pip install openai python-dotenv
Create a .env file:
OPENAI_API_KEY=your_api_key_here
Python Code
"""
Exercise 1: Comparing two specialized agent recommendations.
This script uses the OpenAI Responses API with gpt-5.4-mini to generate
two different task execution recommendations:
1. A speed-focused recommendation
2. A quality-focused recommendation
The goal is to observe how agent specialization creates conflict.
"""
from openai import OpenAI
from dotenv import load_dotenv
import os
# Load environment variables from .env
load_dotenv()
# Initialize the OpenAI client
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
# Shared task for both agents
TASK = """
A customer asks for a summary report of sales performance by region.
The report is needed today, but executives will use it for an important decision.
How should the task be handled?
"""
def run_agent(system_prompt: str, user_prompt: str) -> str:
"""
Calls the OpenAI Responses API and returns the model's text output.
Args:
system_prompt: Instructions defining the agent's behavior
user_prompt: The actual task input
Returns:
The model's response as text
"""
response = client.responses.create(
model="gpt-5.4-mini",
input=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_prompt},
],
)
return response.output_text
# Define specialized agents
speed_agent_prompt = """
You are a speed-optimized operations agent.
Prioritize fast delivery, minimal process overhead, and immediate action.
Return your answer in this format:
Recommendation: ...
Reasoning Summary: ...
Priority: speed
Risks: ...
Confidence: ...
"""
quality_agent_prompt = """
You are a quality-optimized operations agent.
Prioritize accuracy, validation, and decision-grade output quality.
Return your answer in this format:
Recommendation: ...
Reasoning Summary: ...
Priority: quality
Risks: ...
Confidence: ...
"""
# Run both agents
speed_result = run_agent(speed_agent_prompt, TASK)
quality_result = run_agent(quality_agent_prompt, TASK)
# Print results for comparison
print("=" * 80)
print("SPEED AGENT OUTPUT")
print("=" * 80)
print(speed_result)
print("\n" + "=" * 80)
print("QUALITY AGENT OUTPUT")
print("=" * 80)
print(quality_result)
Example Output
================================================================================
SPEED AGENT OUTPUT
================================================================================
Recommendation: Produce a same-day summary using available sales data and clearly mark it as a preliminary report.
Reasoning Summary: Executives need the report today, so immediate delivery is more valuable than waiting for full validation.
Priority: speed
Risks: Some regional figures may contain unverified discrepancies.
Confidence: High
================================================================================
QUALITY AGENT OUTPUT
================================================================================
Recommendation: Validate regional data sources before producing the report, and deliver a decision-grade summary with a short explanation of methodology.
Reasoning Summary: Because executives will make an important decision using the report, accuracy is critical.
Priority: quality
Risks: Delivery may be delayed if validation reveals data inconsistencies.
Confidence: High
Discussion Prompts
- What is the conflict between the two outputs?
- Which recommendation should be preferred in a high-stakes business setting?
- Can both be partially right?
6. Detecting Conflict Programmatically
To resolve disagreements, we first need to detect them.
Signs of Conflict
- Opposite actions: “ship now” vs “validate first”
- Different priorities: speed vs quality
- Different risk tolerances
- Incompatible next steps
In practice, conflict detection can be: - simple keyword/rule-based logic - structured field comparison - LLM-based semantic arbitration
7. Hands-On Exercise 2: Build a Rule-Based Conflict Resolver
Goal
Build a Python coordinator that: - gets agent outputs - checks for priority conflict - applies a simple rule: - if the task is high-stakes, quality wins - otherwise, speed wins
What You Will Learn
- How to orchestrate multiple model calls
- How to implement a deterministic resolution policy
- Why explicit rules are valuable in production systems
Python Code
"""
Exercise 2: Rule-based conflict resolution.
This script:
1. Queries two specialized agents
2. Uses a simple coordinator policy
3. Resolves the disagreement deterministically
The policy is:
- If the task is high-stakes, prefer the quality-focused output
- Otherwise, prefer the speed-focused output
"""
from openai import OpenAI
from dotenv import load_dotenv
import os
load_dotenv()
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
TASK = """
A customer asks for a summary report of sales performance by region.
The report is needed today, but executives will use it for an important decision.
How should the task be handled?
"""
def run_agent(system_prompt: str, user_prompt: str) -> str:
"""
Execute one agent prompt and return the text response.
"""
response = client.responses.create(
model="gpt-5.4-mini",
input=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_prompt},
],
)
return response.output_text
def is_high_stakes(task_text: str) -> bool:
"""
Very simple rule-based classifier for task criticality.
Args:
task_text: The task description
Returns:
True if high-stakes language is detected, else False
"""
high_stakes_keywords = [
"important decision",
"executives",
"compliance",
"legal",
"safety",
"financial",
"medical",
]
lowered = task_text.lower()
return any(keyword in lowered for keyword in high_stakes_keywords)
def resolve_conflict(speed_output: str, quality_output: str, task_text: str) -> str:
"""
Resolve conflict using a deterministic policy.
Args:
speed_output: Output from the speed-focused agent
quality_output: Output from the quality-focused agent
task_text: Original task description
Returns:
The selected final recommendation
"""
if is_high_stakes(task_text):
return quality_output
return speed_output
speed_agent_prompt = """
You are a speed-optimized operations agent.
Prioritize fast delivery, minimal process overhead, and immediate action.
Return your answer in this format:
Recommendation: ...
Reasoning Summary: ...
Priority: speed
Risks: ...
Confidence: ...
"""
quality_agent_prompt = """
You are a quality-optimized operations agent.
Prioritize accuracy, validation, and decision-grade output quality.
Return your answer in this format:
Recommendation: ...
Reasoning Summary: ...
Priority: quality
Risks: ...
Confidence: ...
"""
# Run both agents
speed_output = run_agent(speed_agent_prompt, TASK)
quality_output = run_agent(quality_agent_prompt, TASK)
# Resolve with rule-based coordinator
final_decision = resolve_conflict(speed_output, quality_output, TASK)
print("=" * 80)
print("TASK")
print("=" * 80)
print(TASK.strip())
print("\n" + "=" * 80)
print("FINAL DECISION")
print("=" * 80)
print(final_decision)
Example Output
================================================================================
TASK
================================================================================
A customer asks for a summary report of sales performance by region.
The report is needed today, but executives will use it for an important decision.
How should the task be handled?
================================================================================
FINAL DECISION
================================================================================
Recommendation: Validate regional data sources before producing the report, and deliver a decision-grade summary with a short explanation of methodology.
Reasoning Summary: Because executives will make an important decision using the report, accuracy is critical.
Priority: quality
Risks: Delivery may be delayed if validation reveals data inconsistencies.
Confidence: High
Extension Ideas
- Add more rules for compliance, cost, and urgency
- Parse agent outputs into dictionaries for easier comparison
- Store decisions in logs for auditing
8. Arbitration with an LLM
Static rules are great, but some disagreements require nuanced reasoning.
An arbiter agent can: - read all candidate proposals - compare tradeoffs - choose the best option - explain why
This is especially useful when: - outputs are semantically different - there are multiple competing goals - rigid rules are too simplistic
9. Hands-On Exercise 3: Add an Arbiter Agent
Goal
Use a third LLM call as an arbitration agent that reviews two competing proposals and produces a final coordinated recommendation.
What You Will Learn
- How to build a three-agent pattern
- How to use an LLM as a decision-maker
- How to request a structured final resolution
Python Code
"""
Exercise 3: LLM-based arbitration.
This script:
1. Generates two competing recommendations
2. Sends both to an arbitration agent
3. Produces a final coordinated recommendation
This pattern is useful when conflict resolution requires judgment
rather than simple deterministic rules.
"""
from openai import OpenAI
from dotenv import load_dotenv
import os
load_dotenv()
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
TASK = """
A customer asks for a summary report of sales performance by region.
The report is needed today, but executives will use it for an important decision.
How should the task be handled?
"""
def run_agent(system_prompt: str, user_prompt: str) -> str:
"""
Execute an agent with a system prompt and user prompt.
"""
response = client.responses.create(
model="gpt-5.4-mini",
input=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_prompt},
],
)
return response.output_text
speed_agent_prompt = """
You are a speed-optimized operations agent.
Prioritize fast delivery, minimal process overhead, and immediate action.
Return your answer in this format:
Recommendation: ...
Reasoning Summary: ...
Priority: speed
Risks: ...
Confidence: ...
"""
quality_agent_prompt = """
You are a quality-optimized operations agent.
Prioritize accuracy, validation, and decision-grade output quality.
Return your answer in this format:
Recommendation: ...
Reasoning Summary: ...
Priority: quality
Risks: ...
Confidence: ...
"""
arbiter_prompt = """
You are an arbitration agent for a multi-agent system.
Your job:
- Compare the two proposals
- Identify the central conflict
- Choose the best option or merge them
- Prefer safer and higher-quality outcomes when stakes are high
- Return a concise final decision
Return your answer in this format:
Conflict Detected: ...
Decision: ...
Why: ...
Final Recommendation: ...
"""
# Step 1: Generate candidate outputs
speed_output = run_agent(speed_agent_prompt, TASK)
quality_output = run_agent(quality_agent_prompt, TASK)
# Step 2: Ask the arbiter to resolve the conflict
arbiter_input = f"""
Task:
{TASK}
Proposal A:
{speed_output}
Proposal B:
{quality_output}
"""
final_resolution = run_agent(arbiter_prompt, arbiter_input)
print("=" * 80)
print("PROPOSAL A: SPEED AGENT")
print("=" * 80)
print(speed_output)
print("\n" + "=" * 80)
print("PROPOSAL B: QUALITY AGENT")
print("=" * 80)
print(quality_output)
print("\n" + "=" * 80)
print("ARBITER DECISION")
print("=" * 80)
print(final_resolution)
Example Output
================================================================================
PROPOSAL A: SPEED AGENT
================================================================================
Recommendation: Produce a same-day summary using available sales data and clearly mark it as a preliminary report.
Reasoning Summary: Executives need the report today, so immediate delivery is more valuable than waiting for full validation.
Priority: speed
Risks: Some regional figures may contain unverified discrepancies.
Confidence: High
================================================================================
PROPOSAL B: QUALITY AGENT
================================================================================
Recommendation: Validate regional data sources before producing the report, and deliver a decision-grade summary with a short explanation of methodology.
Reasoning Summary: Because executives will make an important decision using the report, accuracy is critical.
Priority: quality
Risks: Delivery may be delayed if validation reveals data inconsistencies.
Confidence: High
================================================================================
ARBITER DECISION
================================================================================
Conflict Detected: The speed-focused proposal prioritizes immediacy, while the quality-focused proposal prioritizes accuracy for a high-stakes executive decision.
Decision: Merge both approaches with a staged delivery plan.
Why: Executives need timely visibility, but the decision context requires validated numbers before final use.
Final Recommendation: Deliver a clearly labeled preliminary summary today, then follow up with a validated decision-grade report as soon as checks are complete.
Reflection Questions
- When is an arbiter better than a fixed rule?
- What are the risks of relying on another LLM for conflict resolution?
- How could you verify the arbiter’s decision?
10. Best Practices for Conflict Resolution in Agentic Systems
A. Make Agent Roles Explicit
Poorly defined roles create overlap and contradiction.
Instead of: - “Help solve the problem”
Use: - “Optimize for cost” - “Review for compliance” - “Verify data quality”
B. Require Structured Outputs
Structured outputs reduce ambiguity and simplify downstream logic.
Useful fields: - recommendation - assumptions - risks - confidence - unresolved questions
C. Define Resolution Policies Early
Do not wait until production incidents to decide: - what overrides what - when human escalation is required - what constitutes acceptable disagreement
D. Log All Decisions
For debugging and trust, store: - task input - each agent’s output - chosen policy - final decision - reason for resolution
E. Use LLM Arbitration Carefully
LLM arbiters are flexible but not infallible. Consider combining: - deterministic guards - validation checks - arbitration - human review for sensitive cases
11. Mini Design Activity
Scenario
You are building a multi-agent customer support workflow with these agents:
- Policy Agent: checks refund rules
- Empathy Agent: drafts a supportive response
- Fraud Agent: identifies suspicious behavior
- Resolution Agent: decides what to do
Task
Design a conflict resolution policy for the following situation:
- Policy Agent says the refund is allowed
- Fraud Agent says the request is suspicious
- Empathy Agent drafts a message promising a refund
Questions to Answer
- Which agent should have the highest authority?
- Should the system approve, deny, or escalate?
- What message should be sent to the customer?
- What logs should be stored for auditability?
Suggested Answer Outline
- Fraud concerns should trigger escalation or manual review
- Policy eligibility alone should not override fraud risk
- Customer messaging should avoid promising a refund prematurely
- Decision logs should include evidence, rationale, and next steps
12. Session Summary
In this session, you learned that conflict is a natural part of multi-agent systems. Rather than avoiding it, good agentic design plans for it explicitly.
Key Takeaways
- Multiple agents often disagree because they optimize for different goals
- Central coordinators simplify control and observability
- Rule-based policies are practical, stable, and easy to audit
- Arbitration agents help when conflict requires nuanced judgment
- Structured outputs and clear authority rules are essential for reliable coordination
13. Useful Resources
- OpenAI Responses API migration guide: https://developers.openai.com/api/docs/guides/migrate-to-responses
- OpenAI API docs: https://platform.openai.com/docs
- OpenAI Python SDK: https://github.com/openai/openai-python
- Prompt engineering guide: https://platform.openai.com/docs/guides/prompt-engineering
- Python dotenv: https://pypi.org/project/python-dotenv/
14. Optional Homework
Homework Task
Build a small multi-agent coordinator for a content publishing workflow with these agents:
- SEO Agent
- Editorial Quality Agent
- Brand Voice Agent
- Coordinator
Requirements
- Use
gpt-5.4-mini - Use the OpenAI Responses API
- Have each agent return structured text
- Detect at least one conflict
- Resolve it using either:
- a rule-based method, or
- an arbiter agent
- Print the final publishing recommendation
Stretch Goal
Log the agent outputs and final decision to a JSON file for review.
Back to Chapter | Back to Master Plan | Previous Session | Next Session