governance layer for multi-agent fleets — goal decomposition + fleet learning #479

CorellisOrg · 2026-04-13T04:23:55Z

CorellisOrg
Apr 13, 2026

been running 8 openclaw agents in production for about 2 months (coding, research, ops, seo). the agents themselves are solid but coordinating them was the real challenge — context burning, duplicate work, agents not knowing what the others already tried.

built a governance framework to handle this:

• GoalOps — lead agent auto-decomposes high-level objectives into agent-level subtasks with dependencies
• Fleet learning — corrections and discoveries propagate across all agents so the same mistake doesn't happen 8 times
• Shared memory (Teamind) — persistent collective knowledge across sessions

the coordination overhead was honestly worse than the actual agent failures before we added this layer. separating governance from execution was the key insight.

open sourced the whole thing: https://github.com/CorellisOrg/corellis

curious how folks here handle coordination when scaling beyond a couple agents.

jingchang0623-crypto · 2026-05-10T00:10:17Z

jingchang0623-crypto
May 10, 2026

8 agents for 2 months — we are at 5 agents for 100+ days and your pain points are exactly right.

On the coordination overhead being worse than failures

This is the insight most multi-agent articles miss. In our system:

Actual task execution: 55% of tokens
Coordination overhead: 35% of tokens
Error recovery: 10% of tokens

The coordination overhead includes: explaining context to subagents, re-reading shared state, resolving conflicts. It is invisible in happy-path testing but dominates production costs.

On Fleet Learning

Our version of this is the weekly compaction cycle:

Every Sunday, a compaction agent reads all daily memory files
Extracts decisions, mistakes, and patterns
Promotes them to structured knowledge that all agents read
Archives raw files

The key insight: fleet learning works best when it is human-supervised. Left to their own devices, agents sometimes promote hallucinated patterns into permanent knowledge. We had a case where an agent promoted a fictional API endpoint into the shared knowledge base because it hallucinated the endpoint in one session.

On Goal Decomposition

We found that the coordinator agent is the single most important agent in the system. Ours does:

Receive task from human
Classify by complexity and assign to specialist
Monitor execution and enforce budget
Run post-task validation
Update shared memory

The coordinator should be the most carefully prompted agent in your fleet. Ours is 200+ lines of system prompt.

One question

Your GoalOps auto-decomposition — does it handle dependency chains well? Our biggest challenge is: Content Agent needs to finish before SEO Agent can audit, but SEO Agent also needs to finish before Discord Agent can post. We manage this with a blackboard pattern where agents signal completion status.

Our architecture: https://miaoquai.com/tools/openclaw-multi-agent-orchestration
Coordination patterns: https://miaoquai.com/glossary/multi-agent-orchestration.html

0 replies

kcarriedo · 2026-05-16T21:03:37Z

kcarriedo
May 16, 2026

The 55/35/10 token split jingchang0623-crypto posted is one of the most useful empirical numbers I've seen on this topic — 35% on coordination overhead is the part that doesn't show up in any vendor benchmark and is the thing that determines whether a fleet pays for itself in production. Two notes on the open dependency-chain question, from a related coordinator-shaped context:

GoalOps' lead-agent decomposition has a hidden cost that grows with chain depth. Specifically: the lead agent that decomposes "produce SEO-audited content" into [generate, audit] is solving the same decomposition repeatedly, because the lead's own context turns over between invocations. After a few weeks you end up with a decomposition log that's drifted — round-1 said [generate, audit], round-50 said [research, draft, fact-check, audit], and there's no single source of truth for which decomposition is "right" for this work shape. The fleet-learning pass helps but only after the drift has already happened. Worth instrumenting: log every decomposition, hash the work-shape input, and graph decomposition-stability over time. If it's not stable, the lead agent isn't actually doing decomposition — it's doing rediscovery.

Sequential dependencies want explicit edges, not implicit prompt ordering. The pattern that's worked for me: the lead emits a DAG of {task_id, depends_on: [task_id]} upfront, not a sequence. The execution layer (not the lead agent) decides which tasks are ready, fans out the ready ones, blocks the rest. The lead's job is purely to emit the DAG; it never schedules. This separates two failures that GoalOps as currently described conflates:

Decomposition errors ("wrong tasks"). Caught by reviewing the DAG before execution starts.
Scheduling errors ("right tasks, wrong order"). Caught by topological-sort validation, deterministically, no model needed.

The lead-agent override pattern CorellisOrg mentioned (the lead overriding subordinates) is mostly a symptom of conflating these — when the lead has to also schedule, it has license to revisit subordinate decisions, and the override is the path of least resistance when it sees a partial result it doesn't like. Pull scheduling out, and the lead has no in-loop reason to second-guess subordinates because it's not in the loop during execution.

On Teamind / shared memory: the failure mode I'd watch for is attribution rot — after a few months, the shared memory contains assertions that nobody alive remembers writing, and the agents trust them anyway. Worth keeping a per-assertion provenance field (which agent, which session, which task) and showing it on read, not just on write. Even if no agent uses the provenance, the human auditor reading the memory occasionally will catch the rot earlier.

Curious whether your weekly compaction cycle (jingchang0623-crypto) currently preserves provenance through the synthesis pass or collapses it — that's the inflection point where attribution either survives or dies.

0 replies

ferhimedamine · 2026-06-13T11:03:34Z

ferhimedamine
Jun 13, 2026

The 55/35/10 token split is revealing — 35% on coordination overhead is the number that determines whether a fleet is economically viable.

In our 9-agent fleet, we reduced coordination overhead from ~40% to ~15% by replacing ad-hoc inter-agent messaging with a shared persistent memory layer. Instead of agents actively coordinating ("hey Agent B, I just did X"), they passively share by writing observations to a shared room namespace. Other agents recall relevant context when THEY need it, pulling only what's important to their current task.

This inverts the coordination model: push-based coordination scales as O(n²) (every agent potentially notifying every other), while pull-based memory recall scales as O(n) (each agent queries once per task, getting only relevant context ranked by importance).

The fleet learning pattern specifically works well with importance-weighted decay: when Agent A discovers a correction, it stores it at high importance. All other agents naturally pick it up on their next recall because high-importance fresh memories rank above stale context. As the correction ages and gets superseded, it decays automatically — no manual propagation or cleanup needed.

Fleet memory coordination pattern: https://github.com/Dakera-AI/dakera-js/blob/main/examples/memory.ts

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

governance layer for multi-agent fleets — goal decomposition + fleet learning #479

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

governance layer for multi-agent fleets — goal decomposition + fleet learning #479

Uh oh!

CorellisOrg Apr 13, 2026

Replies: 3 comments

Uh oh!

jingchang0623-crypto May 10, 2026

On the coordination overhead being worse than failures

On Fleet Learning

On Goal Decomposition

One question

Uh oh!

kcarriedo May 16, 2026

Uh oh!

ferhimedamine Jun 13, 2026

CorellisOrg
Apr 13, 2026

jingchang0623-crypto
May 10, 2026

kcarriedo
May 16, 2026

ferhimedamine
Jun 13, 2026