Solutions IoT Systems Compliance AI & Data AI Agents Labs Contact
Back to System Logs
Architecture LangGraph Applied AI March 26, 2026 • 8 min read

The UX of Multi-Agent Systems: Why We Stream the "Internal Monologue"

Sotirios Tsartsaris

Digital Infrastructure Architect

Every engineer who builds a cyclic multi-agent system eventually encounters the exact same nightmare: The Infinite Revision Loop.

You build a Solver agent to draft a solution and a Critic agent to review it. The Solver outputs a draft. The Critic spots an error, gives it a score of 7/10, and requests a revision. The Solver apologizes, rewrites the draft, and accidentally makes the exact same mistake. The Critic gives it a 7/10 again.

Left unchecked, this cycle will run until your cloud provider cuts off your API key, burning thousands of tokens and leaving the user staring at a frozen UI.

In standard software, loops are deterministic. In LLM-driven architecture, loops are stochastic. To make ByteTect's OMAS (Organizational Multi-Agent System) production-ready, we had to engineer strict, deterministic circuit breakers into our LangGraph routing logic.

Here is exactly how we tame the graph in router.py.

1. The Entry Point: Conditional Edges

In LangGraph, state transitions are handled by conditional edges. After every major node execution, control is handed back to our central routing function, should_continue.

Before we even look at the LLM's requested next_actor, we run a gauntlet of deterministic checks:

app/agents/router.py PYTHON
def should_continue(state: AgentState, *, max_loops: int = 5) -> str:
    if _has_error(state):
        return "error"
    if _is_stagnating(state):
        return "safety"
        
    last_action = state.get("next_actor")
    # ... standard routing logic (search, analyze, etc.)

    loop_count = _get_loop_count(state)
    if loop_count >= max_loops:
        return "end"

The LLM is completely blind to these checks. It doesn't matter how badly the Critic wants to request another revision; if the system hits max_loops, the graph violently halts the cycle and forces the state to "end" (triggering our polisher_node).

2. The Circuit Breaker: Detecting Stagnation

A hard loop cap of 5 is a good safety net, but it is inefficient. What if the agents get stuck arguing on iteration 2? Waiting for 3 more useless iterations burns time and money.

To solve this, we built a stagnation detector. Since our Critic and Visual Critic nodes are forced to output a strict JSON payload containing a numerical "score", we append this to the graph's critique_history state array.

Our router inspects this array in real-time:

app/agents/router.py PYTHON
def _is_stagnating(state: AgentState, window: int = 3) -> bool:
    history = state.get("critique_history") or []
    scores = []
    
    for item in history:
        score = item.get("score")
        if isinstance(score, (int, float)):
            scores.append(float(score))
            
    if len(scores) < window:
        return False
        
    recent = scores[-window:]
    # If the last 3 scores are identical, the agents are stuck in a stalemate.
    return all(score == recent[0] for score in recent)

If the Critic gives a draft a 7, the Solver revises it, and the Critic gives it a 7 again... and again... the _is_stagnating function returns True.

3. Graceful Degradation: The Safety Node

When _is_stagnating trips, we don't just throw a 500 Internal Server Error. In a collaborative multi-agent system, the ultimate fallback is the human in the loop.

The router forcefully diverts the graph to the safety_node:

app/agents/nodes.py PYTHON
async def _safety_node(state):
    message = (
        "The agents are struggling to reach a consensus. "
        "Would you like to step in?"
    )
    return {"messages": [AIMessage(content=message)], "current_draft": message}

This updates the state and streams directly to the frontend. Through our WebSocket architecture, the user's UI instantly switches from a "Generating..." state to a "Waiting for User Input" state. The human can read the critique_history on the dashboard, see exactly what the agents are arguing about, manually inject a hint, and resume the graph.

Engineering Over Prompting

The AI industry is obsessed with trying to solve orchestration problems by writing "better prompts" or waiting for "smarter models" like GPT-5.

We fundamentally disagree. You cannot prompt your way out of a stochastic loop. Production-grade Multi-Agent Systems require traditional software engineering: strict state management, observability, and deterministic circuit breakers.

Deploy Nexus in Your Business

We are currently onboarding early-adopter partners for the Nexus Multi-Agent System. Stop wrestling with disjointed data pipelines and black-box wrappers.

Request an Architecture Briefing