Skip to content

Session 3: Conflict Resolution and Coordination Strategies

Synopsis

Introduces methods for reconciling competing outputs, selecting between proposals, and coordinating sequential or parallel work. Learners see how governance and arbitration become essential in multi-agent environments.

Session Content

Session 3: Conflict Resolution and Coordination Strategies

Session Overview

Duration: ~45 minutes
Audience: Python developers with basic programming knowledge, learning GenAI and agentic development
Focus: Understanding how multiple agents coordinate, how conflicts emerge, and how to implement practical conflict resolution strategies using the OpenAI Responses API and gpt-5.4-mini

Learning Objectives

By the end of this session, learners will be able to:

  • Explain why conflicts occur in multi-agent and agentic systems
  • Identify common coordination failures such as contradiction, duplication, deadlock, and goal misalignment
  • Apply conflict resolution strategies including prioritization, voting, arbitration, and rule-based reconciliation
  • Implement a simple Python-based coordinator that compares agent outputs and resolves disagreements
  • Build a practical orchestration workflow using the OpenAI Responses API

1. Why Conflict Resolution Matters in Agentic Systems

As soon as multiple agents collaborate, disagreement becomes normal rather than exceptional. In an agentic workflow, one agent may generate a plan, another may review it, and a third may optimize for time, cost, or safety. These agents can produce incompatible recommendations.

Common Sources of Conflict

  • Different objectives
  • One agent optimizes for speed
  • Another optimizes for quality
  • Another optimizes for compliance or safety

  • Different context windows

  • Agents may see different subsets of information
  • One agent may miss a critical constraint

  • Prompt framing differences

  • Slight differences in instructions can lead to contradictory outputs

  • Ambiguity in task ownership

  • Multiple agents may solve the same subproblem in incompatible ways

  • Stale state

  • An agent may reason over outdated information while others use updated state

Examples of Coordination Failures

  • Contradiction: Agent A says “approve the refund,” Agent B says “deny the refund”
  • Duplication: Two agents perform the same task unnecessarily
  • Deadlock: Each agent waits for another to decide
  • Priority inversion: A low-priority optimization overrides a critical safety rule
  • Hallucinated consensus: A coordinator assumes agreement where none exists

2. Coordination Patterns in Multi-Agent Systems

Before resolving conflicts, it helps to understand common coordination patterns.

A. Centralized Coordinator

A single orchestrator collects outputs from agents and makes a final decision.

Pros - Easier to debug - Clear decision authority - Good for production workflows

Cons - Single point of failure - Coordinator quality strongly affects system performance

B. Peer-to-Peer Negotiation

Agents communicate directly and attempt to reconcile differences.

Pros - Flexible - Closer to distributed systems patterns

Cons - Harder to control - Can become expensive or unstable

C. Hierarchical Delegation

A parent agent delegates to specialized child agents, then integrates results.

Pros - Natural task decomposition - Clear responsibility boundaries

Cons - Requires good task design - Parent may still face conflicting recommendations

D. Voting or Ensemble Decision

Several agents independently solve a problem and the system chooses a majority or weighted result.

Pros - Useful for robustness - Reduces dependence on one output

Cons - Majority can still be wrong - Hard to apply when outputs are open-ended


3. Practical Conflict Resolution Strategies

3.1 Rule-Based Prioritization

Use explicit rules to determine which output wins.

Examples: - Safety overrides cost - Compliance overrides convenience - User instruction overrides stylistic preferences

This is often the most practical strategy in production.

3.2 Scoring and Ranking

Assign each proposal a score across dimensions such as: - correctness - cost - latency - safety - alignment with user intent

Then choose the highest total score or the best constrained option.

3.3 Arbitration Agent

Use a separate “arbiter” agent to compare competing proposals and choose one.

This is useful when the conflict requires reasoning rather than static rules.

3.4 Voting

Applicable when multiple agents produce candidate answers in similar formats.

Common methods: - simple majority - weighted majority - confidence-based voting

3.5 Merge-and-Rewrite

Instead of choosing one output, synthesize a unified solution.

Best when: - each proposal is partially correct - tradeoffs can be balanced - a final polished result is needed

3.6 Escalation

If conflict cannot be resolved confidently: - ask a human - request more information - re-run with tighter prompts - trigger a fallback policy


4. Designing a Coordinator

A practical coordinator usually performs these steps:

  1. Collect outputs from specialized agents
  2. Normalize them into a consistent structure
  3. Detect disagreement
  4. Apply decision policy
  5. Produce final output and decision rationale
  6. Log the process for observability

To coordinate effectively, ask each agent to return structured fields such as:

  • recommendation
  • reasoning_summary
  • priority
  • risks
  • confidence

This makes comparison easier than using free-form text.


5. Hands-On Exercise 1: Compare Two Agents with Different Priorities

Goal

Create two specialized agents: - a speed-focused agent - a quality-focused agent

Then compare their recommendations for the same task.

What You Will Learn

  • How different prompts create divergent recommendations
  • How to call the OpenAI Responses API using Python
  • How to inspect outputs before reconciliation

Setup

Install dependencies:

pip install openai python-dotenv

Create a .env file:

OPENAI_API_KEY=your_api_key_here

Python Code

"""
Exercise 1: Comparing two specialized agent recommendations.

This script uses the OpenAI Responses API with gpt-5.4-mini to generate
two different task execution recommendations:
1. A speed-focused recommendation
2. A quality-focused recommendation

The goal is to observe how agent specialization creates conflict.
"""

from openai import OpenAI
from dotenv import load_dotenv
import os

# Load environment variables from .env
load_dotenv()

# Initialize the OpenAI client
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

# Shared task for both agents
TASK = """
A customer asks for a summary report of sales performance by region.
The report is needed today, but executives will use it for an important decision.
How should the task be handled?
"""

def run_agent(system_prompt: str, user_prompt: str) -> str:
    """
    Calls the OpenAI Responses API and returns the model's text output.

    Args:
        system_prompt: Instructions defining the agent's behavior
        user_prompt: The actual task input

    Returns:
        The model's response as text
    """
    response = client.responses.create(
        model="gpt-5.4-mini",
        input=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt},
        ],
    )
    return response.output_text

# Define specialized agents
speed_agent_prompt = """
You are a speed-optimized operations agent.
Prioritize fast delivery, minimal process overhead, and immediate action.
Return your answer in this format:

Recommendation: ...
Reasoning Summary: ...
Priority: speed
Risks: ...
Confidence: ...
"""

quality_agent_prompt = """
You are a quality-optimized operations agent.
Prioritize accuracy, validation, and decision-grade output quality.
Return your answer in this format:

Recommendation: ...
Reasoning Summary: ...
Priority: quality
Risks: ...
Confidence: ...
"""

# Run both agents
speed_result = run_agent(speed_agent_prompt, TASK)
quality_result = run_agent(quality_agent_prompt, TASK)

# Print results for comparison
print("=" * 80)
print("SPEED AGENT OUTPUT")
print("=" * 80)
print(speed_result)

print("\n" + "=" * 80)
print("QUALITY AGENT OUTPUT")
print("=" * 80)
print(quality_result)

Example Output

================================================================================
SPEED AGENT OUTPUT
================================================================================
Recommendation: Produce a same-day summary using available sales data and clearly mark it as a preliminary report.
Reasoning Summary: Executives need the report today, so immediate delivery is more valuable than waiting for full validation.
Priority: speed
Risks: Some regional figures may contain unverified discrepancies.
Confidence: High

================================================================================
QUALITY AGENT OUTPUT
================================================================================
Recommendation: Validate regional data sources before producing the report, and deliver a decision-grade summary with a short explanation of methodology.
Reasoning Summary: Because executives will make an important decision using the report, accuracy is critical.
Priority: quality
Risks: Delivery may be delayed if validation reveals data inconsistencies.
Confidence: High

Discussion Prompts

  • What is the conflict between the two outputs?
  • Which recommendation should be preferred in a high-stakes business setting?
  • Can both be partially right?

6. Detecting Conflict Programmatically

To resolve disagreements, we first need to detect them.

Signs of Conflict

  • Opposite actions: “ship now” vs “validate first”
  • Different priorities: speed vs quality
  • Different risk tolerances
  • Incompatible next steps

In practice, conflict detection can be: - simple keyword/rule-based logic - structured field comparison - LLM-based semantic arbitration


7. Hands-On Exercise 2: Build a Rule-Based Conflict Resolver

Goal

Build a Python coordinator that: - gets agent outputs - checks for priority conflict - applies a simple rule: - if the task is high-stakes, quality wins - otherwise, speed wins

What You Will Learn

  • How to orchestrate multiple model calls
  • How to implement a deterministic resolution policy
  • Why explicit rules are valuable in production systems

Python Code

"""
Exercise 2: Rule-based conflict resolution.

This script:
1. Queries two specialized agents
2. Uses a simple coordinator policy
3. Resolves the disagreement deterministically

The policy is:
- If the task is high-stakes, prefer the quality-focused output
- Otherwise, prefer the speed-focused output
"""

from openai import OpenAI
from dotenv import load_dotenv
import os

load_dotenv()
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

TASK = """
A customer asks for a summary report of sales performance by region.
The report is needed today, but executives will use it for an important decision.
How should the task be handled?
"""

def run_agent(system_prompt: str, user_prompt: str) -> str:
    """
    Execute one agent prompt and return the text response.
    """
    response = client.responses.create(
        model="gpt-5.4-mini",
        input=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt},
        ],
    )
    return response.output_text

def is_high_stakes(task_text: str) -> bool:
    """
    Very simple rule-based classifier for task criticality.

    Args:
        task_text: The task description

    Returns:
        True if high-stakes language is detected, else False
    """
    high_stakes_keywords = [
        "important decision",
        "executives",
        "compliance",
        "legal",
        "safety",
        "financial",
        "medical",
    ]
    lowered = task_text.lower()
    return any(keyword in lowered for keyword in high_stakes_keywords)

def resolve_conflict(speed_output: str, quality_output: str, task_text: str) -> str:
    """
    Resolve conflict using a deterministic policy.

    Args:
        speed_output: Output from the speed-focused agent
        quality_output: Output from the quality-focused agent
        task_text: Original task description

    Returns:
        The selected final recommendation
    """
    if is_high_stakes(task_text):
        return quality_output
    return speed_output

speed_agent_prompt = """
You are a speed-optimized operations agent.
Prioritize fast delivery, minimal process overhead, and immediate action.
Return your answer in this format:

Recommendation: ...
Reasoning Summary: ...
Priority: speed
Risks: ...
Confidence: ...
"""

quality_agent_prompt = """
You are a quality-optimized operations agent.
Prioritize accuracy, validation, and decision-grade output quality.
Return your answer in this format:

Recommendation: ...
Reasoning Summary: ...
Priority: quality
Risks: ...
Confidence: ...
"""

# Run both agents
speed_output = run_agent(speed_agent_prompt, TASK)
quality_output = run_agent(quality_agent_prompt, TASK)

# Resolve with rule-based coordinator
final_decision = resolve_conflict(speed_output, quality_output, TASK)

print("=" * 80)
print("TASK")
print("=" * 80)
print(TASK.strip())

print("\n" + "=" * 80)
print("FINAL DECISION")
print("=" * 80)
print(final_decision)

Example Output

================================================================================
TASK
================================================================================
A customer asks for a summary report of sales performance by region.
The report is needed today, but executives will use it for an important decision.
How should the task be handled?

================================================================================
FINAL DECISION
================================================================================
Recommendation: Validate regional data sources before producing the report, and deliver a decision-grade summary with a short explanation of methodology.
Reasoning Summary: Because executives will make an important decision using the report, accuracy is critical.
Priority: quality
Risks: Delivery may be delayed if validation reveals data inconsistencies.
Confidence: High

Extension Ideas

  • Add more rules for compliance, cost, and urgency
  • Parse agent outputs into dictionaries for easier comparison
  • Store decisions in logs for auditing

8. Arbitration with an LLM

Static rules are great, but some disagreements require nuanced reasoning.

An arbiter agent can: - read all candidate proposals - compare tradeoffs - choose the best option - explain why

This is especially useful when: - outputs are semantically different - there are multiple competing goals - rigid rules are too simplistic


9. Hands-On Exercise 3: Add an Arbiter Agent

Goal

Use a third LLM call as an arbitration agent that reviews two competing proposals and produces a final coordinated recommendation.

What You Will Learn

  • How to build a three-agent pattern
  • How to use an LLM as a decision-maker
  • How to request a structured final resolution

Python Code

"""
Exercise 3: LLM-based arbitration.

This script:
1. Generates two competing recommendations
2. Sends both to an arbitration agent
3. Produces a final coordinated recommendation

This pattern is useful when conflict resolution requires judgment
rather than simple deterministic rules.
"""

from openai import OpenAI
from dotenv import load_dotenv
import os

load_dotenv()
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

TASK = """
A customer asks for a summary report of sales performance by region.
The report is needed today, but executives will use it for an important decision.
How should the task be handled?
"""

def run_agent(system_prompt: str, user_prompt: str) -> str:
    """
    Execute an agent with a system prompt and user prompt.
    """
    response = client.responses.create(
        model="gpt-5.4-mini",
        input=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt},
        ],
    )
    return response.output_text

speed_agent_prompt = """
You are a speed-optimized operations agent.
Prioritize fast delivery, minimal process overhead, and immediate action.
Return your answer in this format:

Recommendation: ...
Reasoning Summary: ...
Priority: speed
Risks: ...
Confidence: ...
"""

quality_agent_prompt = """
You are a quality-optimized operations agent.
Prioritize accuracy, validation, and decision-grade output quality.
Return your answer in this format:

Recommendation: ...
Reasoning Summary: ...
Priority: quality
Risks: ...
Confidence: ...
"""

arbiter_prompt = """
You are an arbitration agent for a multi-agent system.

Your job:
- Compare the two proposals
- Identify the central conflict
- Choose the best option or merge them
- Prefer safer and higher-quality outcomes when stakes are high
- Return a concise final decision

Return your answer in this format:

Conflict Detected: ...
Decision: ...
Why: ...
Final Recommendation: ...
"""

# Step 1: Generate candidate outputs
speed_output = run_agent(speed_agent_prompt, TASK)
quality_output = run_agent(quality_agent_prompt, TASK)

# Step 2: Ask the arbiter to resolve the conflict
arbiter_input = f"""
Task:
{TASK}

Proposal A:
{speed_output}

Proposal B:
{quality_output}
"""

final_resolution = run_agent(arbiter_prompt, arbiter_input)

print("=" * 80)
print("PROPOSAL A: SPEED AGENT")
print("=" * 80)
print(speed_output)

print("\n" + "=" * 80)
print("PROPOSAL B: QUALITY AGENT")
print("=" * 80)
print(quality_output)

print("\n" + "=" * 80)
print("ARBITER DECISION")
print("=" * 80)
print(final_resolution)

Example Output

================================================================================
PROPOSAL A: SPEED AGENT
================================================================================
Recommendation: Produce a same-day summary using available sales data and clearly mark it as a preliminary report.
Reasoning Summary: Executives need the report today, so immediate delivery is more valuable than waiting for full validation.
Priority: speed
Risks: Some regional figures may contain unverified discrepancies.
Confidence: High

================================================================================
PROPOSAL B: QUALITY AGENT
================================================================================
Recommendation: Validate regional data sources before producing the report, and deliver a decision-grade summary with a short explanation of methodology.
Reasoning Summary: Because executives will make an important decision using the report, accuracy is critical.
Priority: quality
Risks: Delivery may be delayed if validation reveals data inconsistencies.
Confidence: High

================================================================================
ARBITER DECISION
================================================================================
Conflict Detected: The speed-focused proposal prioritizes immediacy, while the quality-focused proposal prioritizes accuracy for a high-stakes executive decision.
Decision: Merge both approaches with a staged delivery plan.
Why: Executives need timely visibility, but the decision context requires validated numbers before final use.
Final Recommendation: Deliver a clearly labeled preliminary summary today, then follow up with a validated decision-grade report as soon as checks are complete.

Reflection Questions

  • When is an arbiter better than a fixed rule?
  • What are the risks of relying on another LLM for conflict resolution?
  • How could you verify the arbiter’s decision?

10. Best Practices for Conflict Resolution in Agentic Systems

A. Make Agent Roles Explicit

Poorly defined roles create overlap and contradiction.

Instead of: - “Help solve the problem”

Use: - “Optimize for cost” - “Review for compliance” - “Verify data quality”

B. Require Structured Outputs

Structured outputs reduce ambiguity and simplify downstream logic.

Useful fields: - recommendation - assumptions - risks - confidence - unresolved questions

C. Define Resolution Policies Early

Do not wait until production incidents to decide: - what overrides what - when human escalation is required - what constitutes acceptable disagreement

D. Log All Decisions

For debugging and trust, store: - task input - each agent’s output - chosen policy - final decision - reason for resolution

E. Use LLM Arbitration Carefully

LLM arbiters are flexible but not infallible. Consider combining: - deterministic guards - validation checks - arbitration - human review for sensitive cases


11. Mini Design Activity

Scenario

You are building a multi-agent customer support workflow with these agents:

  • Policy Agent: checks refund rules
  • Empathy Agent: drafts a supportive response
  • Fraud Agent: identifies suspicious behavior
  • Resolution Agent: decides what to do

Task

Design a conflict resolution policy for the following situation:

  • Policy Agent says the refund is allowed
  • Fraud Agent says the request is suspicious
  • Empathy Agent drafts a message promising a refund

Questions to Answer

  1. Which agent should have the highest authority?
  2. Should the system approve, deny, or escalate?
  3. What message should be sent to the customer?
  4. What logs should be stored for auditability?

Suggested Answer Outline

  • Fraud concerns should trigger escalation or manual review
  • Policy eligibility alone should not override fraud risk
  • Customer messaging should avoid promising a refund prematurely
  • Decision logs should include evidence, rationale, and next steps

12. Session Summary

In this session, you learned that conflict is a natural part of multi-agent systems. Rather than avoiding it, good agentic design plans for it explicitly.

Key Takeaways

  • Multiple agents often disagree because they optimize for different goals
  • Central coordinators simplify control and observability
  • Rule-based policies are practical, stable, and easy to audit
  • Arbitration agents help when conflict requires nuanced judgment
  • Structured outputs and clear authority rules are essential for reliable coordination

13. Useful Resources

  • OpenAI Responses API migration guide: https://developers.openai.com/api/docs/guides/migrate-to-responses
  • OpenAI API docs: https://platform.openai.com/docs
  • OpenAI Python SDK: https://github.com/openai/openai-python
  • Prompt engineering guide: https://platform.openai.com/docs/guides/prompt-engineering
  • Python dotenv: https://pypi.org/project/python-dotenv/

14. Optional Homework

Homework Task

Build a small multi-agent coordinator for a content publishing workflow with these agents:

  • SEO Agent
  • Editorial Quality Agent
  • Brand Voice Agent
  • Coordinator

Requirements

  • Use gpt-5.4-mini
  • Use the OpenAI Responses API
  • Have each agent return structured text
  • Detect at least one conflict
  • Resolve it using either:
  • a rule-based method, or
  • an arbiter agent
  • Print the final publishing recommendation

Stretch Goal

Log the agent outputs and final decision to a JSON file for review.


Back to Chapter | Back to Master Plan | Previous Session | Next Session