All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- Out-of-session substrate wording reconciled (Wave F). Live multi-agent orchestration is documented as NTM/ATM + Agent Mail; Gas City remains an optional reference SDK, not the default operator substrate. Historical 3.0.0 CHANGELOG entries that name Gas City as primary orchestration are superseded by this note for current doctrine — see
docs/3.0.mdanddocs/architecture/codebase-overview.md.
- Wave A:
cli/AGENTS.mdbr pointer stub;ao session bootstrapcounts ready beads viabr ready --json; newcli.agents-trackerdrift gate. - Wave B: RPI terminology glossary in
docs/architecture/codebase-overview.md. - Wave C:
ports-and-adapters.md, BC5 inbounded-contexts.yaml,ARCHITECTURE.md/CI-CD.mdhookless framing;docs.architecture-driftgate. - Wave D: worktree disposition audit (
docs/audits/2026-06-16-worktree-disposition-audit.md) — dry-run only. - Wave E: registered
skill-probe-i0.sh(I0 advisory) andcheck-provenance-orphans.shin Go gate registry; deferred workflow script count 13 → 11. - Wave F: disposition triage checklist (
docs/audits/2026-06-16-skill-disposition-triage.md).
AgentOps 3.1 is the packaging-and-gate-discipline minor on top of the hookless 3.0 core. The headline is not a new feature — it is that the release pipeline itself became the product. Three things hardened together: the release gate moved native (the inline bash checks were ported to a single Go gate that the pre-push hook runs as the release authority, with workflow-parity enforcement so a check can't pass locally while drifting in CI); close-admission tightened (the verdict gate now counts distinct validator families with author exclusion, so a self-graded "looks good" no longer admits a close); and the skill corpus was canonicalized (ghost skills materialized into the tree, registries/counts regenerated from one source, and bundled image copies forced back to byte identity with that source). On top of that discipline, the three IMAGE-CORE recipes (Claude / Codex / Gemini-AGY) ship as real, smoke-proven install paths. See docs/3.1.md for the release narrative.
- Installable image artifacts for the 3.1 runtime set. Claude Code now has a dedicated marketplace installer wrapper, Codex keeps the native plugin curl installer, and Gemini/Antigravity now has a curl installer that validates and installs
images/geminithroughagy plugin. The three documented one-liners are in README §Install; the recipes are materialized asimages/{claude,codex,gemini}/(each with a manifest,README.md, andverify.sh). - Day-2 install operations are productized. Added docs/install-day2-ops.md — operator guidance for install, update, backup, permission repair, recovery, and escalation across Claude, Codex, and Gemini/AGY (the AT&O "make Day-2 part of the product, not an afterthought" doctrine).
- Installer smoke proof covers all three image paths.
tests/install/test-install-smoke.shchecks the Claude, Codex, and Gemini/AGY one-liners without invoking vendor runtimes (measured: 45 passed, 0 failed). - Claude installer release-pinning.
scripts/install-claude.shnow accepts--ref <ref>(orAGENTOPS_INSTALL_REF) to pin the marketplace source to a tagged release (e.g.v3.1.0), reaching parity withinstall-agy.sh. - Claude image version guard.
images/claude/verify.shnow asserts.claude-plugin/plugin.jsondeclares the expected release version (default3.1.0, override viaAGENTOPS_EXPECTED_VERSION), so a stale-version drift in the marketplace manifest fails the gate. ao skills resolve— MECE corpus audit (c9c3e2706). Reports skill overlap and coverage gaps so the corpus can be kept mutually-exclusive / collectively-exhaustive.ao refinerybackstop daemon (cba058c6a, ag-qidx) — a refinery surface for the gate/registry regen path, with acli-skills-mapregenerated for it.- Go gate workflow coverage + parity reporting (
390695b65,281d46d34,283e60e78) — the gate now reports which CI workflows back each check and requires workflow parity, so a locally-green check that has drifted out of CI fails the gate.
- Pre-push gate flipped to the native Go gate (
c97efcc36PB2,efbcf7e90,ca9366ccb) — the inline bash checks were ported to native Go checks (643e7ade1,b720a72cd,bcaae4b02,5b93b4814,6814e377c— PB1 batches) and the pre-push hook now runsao gate checkas the release authority. Local validation is the routine shipping path; GitHub Actions is an optional/manual backstop. - Push-to-main cockpit doctrine (
d90ff1d09,0f501beae,a023dc4df) — workflow rewritten for push-to-main with a concurrency-scoped push lock (push-serial.sh); dead merge-machinery scripts deleted. - Local validation is the release authority (
3d4fc1299,ac5664c56, ag-3l86) — local-CI made viable as a DSR fallback when GitHub Actions quota is exhausted. - Skill corpus canonicalized + triggers added. 4 ghost
codebase-*skills materialized into the tree and registered across catalog surfaces (dd34f8688,584701af8,61421cdb9,0580f7303,cp-801l); explicit "Use-when" triggers added to weak-trigger skills (5cdd8e792,5895422cd,e0932310d);using-ntmrenamed tousing-atm(5c4e7f07a); 2026-06-10 operational lessons encoded into 4 canonical skills (d286a5fc2).
- Verdict gate now enforces distinct families + author exclusion (
ef63eec78(cherry-pick of 5f7120ff6),d6576b359,94181cc89,9c1398671) —council_judgecounted files, not distinct validator families, and did not exclude the author; a self-graded verdict could admit a close (LAW-3 gap).ef63eec78addstickVerdictIdentity(author, judge name, judge program, judge model family) and thetickCouncilGatefamily-count check totick.go;d6576b359/94181cc89/9c1398671seal the skill-edit contract surfaces that the gate now audits. - Removed Claude
--printprobes from headless paths (267f378a0,dee7a3502, ag-eli0) — the headless checks and the team-runner path no longer shell out toclaude --print(LAW 0 compliance). - Distribution install contract aligned (
29b919f92) — theagentops-core.distribution-install-updatecanary was failing on main; the contract was realigned soclaude plugin updateand the install/update path stay green. - Gemini image bundle drift (
3e0f8f4e8,0e5d07211) — refreshed the bundled Gemini/AGYSKILL.mdcopies soimages/gemini/verify.shproves byte identity against the canonical source corpus. The residualvalidateandvibing-with-ntmdrift left by the 2026-06-10 4-skill lesson encoding was resynced so bothimages/gemini/verify.shandscripts/validate-agy-plugin.shpass byte-identity. - Skill-auditor pass-1 gated on heal-strict (
dbc2a6466); post-merge skill lint drift repaired (f784d2842).
Patch release. No runtime behavior change — it advances the plugin/marketplace version so existing 3.0.0 installs can claude plugin update cleanly (the 3.0.0 version label never moved across the 15 commits that landed the skill consolidation and doc reconciliation, so a plain update reported "already latest"). Also reconciles two release-engineering surfaces the per-PR gate did not exercise:
claude plugin updatenow works for early 3.0.0 installs. Version bumped to 3.0.1 across.claude-plugin/plugin.json,.claude-plugin/marketplace.json(metadata + plugin), and theinstall-ao.shpin.- Nightly Static Validation is green again. Removed the Nightly step that ran the deleted
scripts/validate-hooks-doc-parity.sh(hooks were removed in 3.0). Repointed/retired two staletests/docs/validate-skill-count.shextraction patterns that grepped PRODUCT.md for hook-era phrasing ("runtime hook event sections", the "### 1. Skills (N across 4 runtimes)" heading) that the 3.0 restructure removed.
AgentOps 3.0 is the hookless-first major. The headline: AgentOps is what runs in-session — skills + the ao CLI + the RPI/evolve/crank/swarm loops + the context-compiler. Out-of-session orchestration (scheduling, daemons, autonomous dispatch) is delegated to Gas City (a reference City ships in this release) or Olympus. The default install registers zero hooks; the lifecycle is driven by skills and CI gates. See docs/3.0.md for the north star.
- Hookless by default (
soc-57b7f) — the default install registers no hooks. All 53 runtime hooks were audited and removed; capabilities that lived in hooks (standards injection, commit-review gating, noise injectors) are now CLI ports + CI gates. Opt back in with--with-hooks(AGENTOPS_INSTALL_HOOKS=1) orao hooks install --force. Migration: if you relied on hook-driven behavior, switch to the CI gate or the equivalentaosubcommand. - Daemon carved out (
soc-2rtm0, daemon-carve wave 5) —internal/daemon+ theagentopsdbinary were deleted. AgentOps no longer ships an out-of-session daemon; Gas City is the orchestration substrate. Builder cores that previously ran as daemon jobs (wiki build, llmwiki loop) keep their in-process / GC-callable core but lose the daemon-job-executor wrapper. - Scheduling/plans/watch/overnight commands removed (
soc-2rtm0, kill-schedule-overnight) —ao schedule,ao plans,ao watch, and the overnight engines were deleted. Gas City owns scheduling. The phased RPI engine keeps its non-gc backends (auto/direct/stream/tmux);runtime=gcis no longer a valid mode. factorycommand + contract corpus retired (soc-2rtm0, cascade-rip) — theao factorycommand and its contract corpus were removed.- Gas City (gc) bridge severed (
soc-2rtm0, wave 2) — the CLI gc-bridge glue (gc_bridge.go,gc_events.go,rpi_phased_gc.go) was deleted; the daemon-side GasCity client is a separate, retained surface.
- AgentOps reference Gas City — a turnkey City (
city.toml+agentopspack + overlay) that demonstrates out-of-session orchestration on top of AgentOps skills. ao validate --gate— exit-code verdict for Gas City retry loops and CI; PASS/WARN/FAIL collapses to a process exit code.- Real hexagonal port adapters —
corpus_fsCorpusReader/Writer, a bd-backed TrackerPort, and a git-backed WorkspacePort, each wired to its core consumer. Port-realness audit indocs/architecture/. - BDD acceptance layer — scenario→test linkage runner + CI gate (
soc-63xfx); canonical.featureacceptance added across skills (research/plan/validation/trace/handoff/readme/autodev/flywheel/inject and more) and daemon-lifecycle Gherkin linked to tests. scripts/ship.sh(#346,soc-33uy) — single-command wrapper that auto-detects inventory-touching diffs and routes through the full pre-push gate (no--fastskip on skill/contract changes), preemptively running the regen sweep (sync-skill-counts, codex-hashes, domain-map, context-map, registry, sync-hooks). Mechanical fix for ship-loop anti-pattern #1.
- Honest 3.0 doctrine —
docs/3.0.mdis the canonical north star: AgentOps is in-session; Gas City and Olympus own out-of-session. Doctrine, PRODUCT, and README were reconciled to the in-session core (skills + rpi + evolve + crank + swarm + context-compiler), with the out-of-session autonomous-dispatch gap explicitly labeled (soc-5jwah). - Version sync to 3.0.0 —
.claude-plugin/plugin.json,.claude-plugin/marketplace.json(metadata + plugins[0]), and the City pack's pinnedAO_VERSION(packs/agentops/assets/scripts/install-ao.sh) all bump to 3.0.0. - Coherent-arc PR rule (#348,
soc-1lp1) — replaces "one scenario per PR" default inCLAUDE.md+AGENTS.md. The unit of a PR is one closable bead (or small-epic slice) with a single rollback semantic. Small epics (≤5 child beads, same surface) ship as one PR with N commits; large epics ship as N PRs sliced by scenario or wave. Updatesship-loopskill (Claude + Codex twins). Derived from the 2026-05-19 8-PR merge-arc burn-through. skill-auditorrubric scoring (soc-ads5v) — adds a Pass 3 rubric-scoring stage.- Executable-spec-link-integrity promoted warn→blocking (
soc-x7y9f) — the scenario→test link gate is now a required CI check.
- Daemon robustness hardening — bounded idempotency key + request decode guards (
soc-scg3h), heartbeat ledger-write timeout (soc-15g9f), heartbeat goroutine-lifecycle bound (soc-yyrrq), payload-shape validation before ledger append (soc-qra05), and executor-panic recovery in supervisor dispatch (soc-as401). (Hardening landed before the daemon carve-out for the GasCity-side client surface.) goals trace --orphansparser now rejects bead prose and matches real orphans.- 6 pre-existing shellcheck warnings (#349,
soc-j026) cleaned acrossvalidate-codex-api-conformance.sh,goal-failure-taxonomy.sh,purge-global-garbage.sh,nightly-pr-digest.sh,add-validate-job.sh,check-skill-size.sh. All in scripts unchanged from base — atomic side-quest per anti-pattern #2. - Local pre-push gate eval-canary strictness mismatch (#350,
soc-nmhp) — full-mode eval-canaries now respect the same path filter as CI'seval-workbench-verify(HAS_EVAL=1 || is_ci_env). Previouslyneeds_check evalshort-circuited to true in full mode, producing 5 spurious FAILs on doc/script PRs against a baseline-less local env. Closes the "Local-vs-CI environment drift" learning (docs/learnings/2026-05-07-ci-push-gate-toil-pattern.md). - Eval fixtures referencing deleted hooks (
soc-t40ai) cleaned; stale deleted-hook copies removed fromcli/embedded/.
- Release CI shipped red in v2.41.0 — the v2.41.0 tag was cut on a commit where three
Validatechecks were already failing; they were masked because the path-filtered CI marks untriggered jobsskipped(notfailed). This patch clears all three:registry.jsonregenerated after the v2.41 arc added 10aosubcommands;skills/validation/SKILL.mdlinksreferences/flags.mdwith a real markdown link soheal.sh --strictpasses; thecli-command-surface-matrixeval baseline + fixture updated to the current heading count (top=70 sub=173 all=243); and 7/skillnameslash-command references inskills-codex/{evolve,validation}/SKILL.mdconverted to Codex$skillnamenotation so the codex-native install canary passes. --oscillation-sweepremoval was incomplete in v2.41.0 —soc-1q1xdeleted theao defrag --oscillation-sweepflag but the cycle-181 audit only checked Go callers. Seven shell/YAML/doc callsites still referenced the removed flag, breaking thevalidate-flywheel-proofCI gate (scripts/nightly-dream-cycle.shshelled the dead flag). Cleared from.github/workflows/nightly.yml,scripts/nightly-dream-cycle.sh,scripts/check-compile-oscillation.sh,docs/contracts/dream-run-contract.md, and thecompileskill phase docs.extract-release-notes.shdouble-blank line — the awk extractor captured the blank line after the## [VERSION]heading; wrapped in the<details>boilerplate it produced a double-blank between<summary>and the first subsection. Now strips leading/trailing blank lines from the changelog section.ci-local-release.shsecret-scan false positive — the secret-pattern scan recursively grepped the gitignored_site/MkDocs build directory, matchingpassword:!0inside minified JS bundles. Added_siteandsiteto the scan's--exclude-dirlist.
- BC ports — 14/14 production-complete.
FactoryAdmissionPort(13th,soc-2klg.1) +productionFactoryAdmission(cycle 139–140) andClaimEvidencePort(14th,soc-2klg.2) +productionClaimEvidence(cycle 141–142) land the last two BC4 ports, closing the 14-port BC inventory begun in the v2.40.0 cycle. Each port follows the<port>.go+inmemory_<port>.go+inmemory_<port>_test.gotriplet plus aproductionXproduction adapter, with compile-time interface assertions as drift guards. - 10 typed-port-backed
aosubcommands — operator-facing CLI now exposes every BC port that needs a script-side surface:ao loop history(productionLoopReader, cycle 144, slice 1 ofsoc-y5vh.5),ao ci latest/recent(productionCIStatus, cycle 145, slice 2),ao corpus inject(productionCorpusReader, cycle 146, slice 3; closessoc-y5vh.5),ao operator record/list(cycle 147 — template-applied 1st of 7),ao harness status(cycle 148),ao gate run(cycle 149),ao corpus capture(cycle 150, BC1 R/W pair complete on CLI),ao loop append(cycle 151, BC3 R/W pair complete on CLI),ao citation verify(cycle 152, BC1 round-trip),ao claim bind/list(cycle 153, 10th adapter CLI-exposed). Establishes a reusable cycle-shape captured indocs/learnings/2026-05-13-cli-wiring-cycle-shape.md. ao loop verify— audits.agents/evolve/cycle-history.jsonlintegrity via the typed BC3LoopReaderPort(uses cycle-161 widening). Operator-facing health check on the loop ledger.CycleEntry.StartedAt+Titlefields (soc-ckc4, ports widening at cycle 161 + writer-side round-trip at cycle 162). Enables theao loop verifyaudit and richer downstream consumers of the cycle ledger.- BC3 LoopReaderPort operator wrapper —
scripts/evolve-read-cycle-history.shprovidesrecent [N]/latest/range START ENDmodes aroundao loop history, replacing inlinetail/awk/jqshell-outs over the raw.agents/evolve/cycle-history.jsonl(soc-y5vh.4). Three migrated callsites:skills/evolve/references/convergence-mechanics.mdMechanism 1,skills/evolve/references/oscillation.mdoscillation counter, andskills/evolve/references/fitness-scoring.mdOscillation Detection block (soc-lghj). - Supergate Gap 3 (loop-closure) bats coverage —
tests/scripts/check-three-gap-supergate.batsextended with three tests (happy-path PASS,goals-validateFAIL,flywheel-proofSKIP) using a PATH-shimmedgothat produces a controlled/tmp/ao-sg(soc-wxh5.3). Suite goes 15 → 18 tests; closes the cycle-63 Gap 3 deferral. - Mandatory STEP 1.7.5 release-readiness gates in
/validation— auto-detects release context from branch name (release/*,v*-prep,v*-evolve-run,v\d+\.\d+*) or--release-contextflag and requiresscripts/pre-push-gate.sh(full, not--fast),scripts/ci-local-release.sh, andscripts/generate-cli-reference.shcleanliness check when CLI surface changed. Validation refuses to recommend/releaseuntil all three pass. Codex parity synced. /evolveteardown pre-release checklist — when the loop runs on a release-shaped branch, the teardown report emits an explicit unchecked checklist (regen CLI docs, full pre-push,ci-local-release.sh, optional smoke run) instead of recommending/release. The handoff artifact carries the checklist verbatim; "ready to tag" means boxes checked, not cycles green.- Acceptance-Text vs Delivered Drift audit in
/post-mortemclosure-integrity check — for each closed child, parses the beadAcceptance:section, extracts named gates, and WARNs when the close-note does not confirm the gate ran green. Catches the failure mode where a bead's acceptance language drifts from delivered evidence (origin: cycle 182soc-w6vh.4). .agents/operator/write-surface contract entry —docs/contracts/agents-write-surfaces.mdnow documents the BC4OperatorPortdurable-intent log (allowlist + classification row, lifecycle=rolling, writer=cli).
/evolveStep 0 prior-knowledge retrieval now routes through the typed BC1CorpusReaderPort(cli/cmd/ao/corpus_reader_adapter.go, cycle 112productionCorpusReader) viaao corpus injectinstead of the legacyao lookupshell-out (soc-y5vh.1). Skill text inskills/evolve/SKILL.mdandskills-codex/evolve/SKILL.mdsynced; codex hashes regenerated./evolveStep 1.5 healing-first classifier now routes through the typed BC2CIStatusPort(cli/cmd/ao/ci_status_adapter.go, cycle 117productionCIStatus) viaao ci recent --limit 1instead of an inlinegh run list --workflow validate.yml --json conclusion(soc-y5vh.2). Both callsites (skills/evolve/SKILL.md+skills/evolve/references/convergence-mechanics.md) updated in lockstep. Zero remaining inlineghshell-outs in/evolve's hot read path.cli/cmd/aocoverage floor raised back to 76 % inscripts/check-cmd-ao-coverage.shafter real statement coverage climbed to 76.1 % (23553/30953) on the v2.41-evolve-run baseline (soc-wxh5.1). The cycle-60 recalibration to 75 % is reversed./evolvesession-state refresh at Step 0 so the dormancy gate stays correct after long-running cycles harvest follow-ups (cycle 171 retrospective fix)./releaseskill refactor — moved Examples + Troubleshooting + the non-HEAD cut-version logic intoreferences/release-workflow-detail.mdto bringSKILL.mdback under thetier=executionsize limit. Behavioral surface unchanged; cycle 169 post-mortem restoration commit restored Examples and Troubleshooting after the initial extraction.
three-gap-supergategoals-validate sub-gate —scripts/check-three-gap-supergate.shnowrm -f /tmp/ao-sgbeforego build -o /tmp/ao-sg. Go refuses to overwrite a non-object file at the build-output path, so any prior process that wrote a non-binary to/tmp/ao-sg(including the bats-test shim go) would otherwise wedge the gate. The bats teardown intests/scripts/check-three-gap-supergate.batsalso cleans/tmp/ao-sgto remove test pollution. Caught byao goals measure(1 failing → 0 failing) — first validation that the new STEP 1.7.5 mechanical gate works on real drift.
- Dead
defrag.SweepOscillatingGoalsfunction and all callers (soc-1q1xpath 1). The function read.agents/evolve/cycle-history.jsonlfor entries with atargetfield; zero entries have ever had one in production, so callers (runCompileDefrag,runDefragPhases,runDreamDefragPreview) always got empty results. Net removal: 6 files, 17 insertions / 465 deletions —SweepOscillatingGoals+ 5 helpers +CountAlternations+CycleRecord+OscillationResult+OscillatingGoaltypes +DefragReport.Oscillationfield +defragOscillationSweepflag +--oscillation-sweepCLI flag + 9 tests. Build green; 11924 tests pass. - 43 dead-code findings swept from
cli/— across multiple cycles (156 cleared 10staticcheckU1000 findings, 157 swept 18 more, 158 swept 10 more, 159 swept 5 more fromsoc-k083). Closes long-standing static-analysis backlog;LoadCycleHistorydeletion (cycle 156) was the canonical example of the dead-code pattern thatsoc-1q1xlater applied to oscillation-sweep.
- DDD/Hex architecture rescope arc — 13 cycles closed. Phase-1 complete + phase-2 retrospective (
docs/rescope/2026-05-13-ddd-hex-architecture-rescope.md); test-architecture-debt analysis reconciled to deletions; BC ports phase-2 narrowness post-mortem captured (docs/learnings/2026-05-13-bc-ports-narrowness-postmortem.md). - Learnings catalog hygiene — README + when-to-add rubric (cycle 134), empirical /loop context-drift study over 87+ cycles (cycle 135), BC-ports wire-up arc retrospective (cycle 122), CLI-wiring cycle-shape template (cycles 144–146), substring sed-rename overreach warning.
- Contract drift resolution —
soc-5yuydrift #5 resolved via audit;cli-skills-map.mdrefreshed to 70 generated CLI command headings for v2.41-prep BC-arc commands.
- Practice-citation derivation graph —
PRACTICE-REGISTRY.mdbecomes the derivation root for every primitive in the repo. A newpractice-citation derivation graph + advisory CI gateenforces that every skill, hook, eval suite, CLI command, and schema declares itspractices:lineage. Twelve backfill passes (pass-1throughpass-12, fourpass-12waves acrosscli/) reached 756/756 declared primitives — full repo coverage. Touchesskills/,hooks/,schemas/,evals/agentops-core/, andcli/via// practice:comment carriers. - Three-gap supergate (E5 epic) —
scripts/check-three-gap-supergate.shconsolidates three release-blocking gates into one entry point with Gap 1--strict-coverageopt-in, wired intoscripts/pre-push-gate.shand thevalidate-three-gap-supergateCI workflow. Closessoc-m47kand three child gaps. - Contract enforcement gates —
contracts-structural-floor(covering all 38 contracts), plus dedicated CI-blocking gates forfactory-admission,finding-registry,factory-yield-ledger,flywheel-compounding-snapshot,wiring-closure,goals-validate,flywheel-proof,quarantine-empty, andcontract-canaries(check 24e).feat(gate) --two-pass modeand local-scope default added. - Behavioral eval workbench — 12 eval tasks, first suite, three fixture components (Go CLI, Python FastAPI, DevOps scripts). Live-agent eval suite with 3 workbench cases,
scripts/eval-agent-harness.shwired into the eval CLI, and the suite expanded to all 12 workbench tasks. - Eval CI gates —
eval-skill-deltaCI gate + nightly schedule template,eval-workbench-verifygate (GOALS.md Directive 10), head-to-head delta gate against the baseline scorecard (D10). Run records upload as CI artifacts for triage; Python venv bootstrapped inagentops-eval-advisory. - Eval harness primitives —
context_comprehensiondimension, CDLC identity field, observability feedback loop, and industry-proven eval patterns added to the agent harness. - Unified registry — Phase 1 added
scripts/generate-registry.sh+registry.jsonsource-of-truth. Phase 2 added a job type, CLI surface, and CI gate (registry-check). - Daemon factory-admission lane —
feat(daemon): add factory admission job specs+ executor + lane validators, paired withfeat(gates): enforce factory-admission contract. - Daemon RPI agent-update events —
RPIRunExecutoremitsagent_update.criterion_verdictper wave checkpoint (soc-awx8) and phase-boundaryagent-updateevents (soc-y0ct.2). - Daemon scheduling + executors — wired skill schedules through the daemon, registered eval and planning job types, added a CLI fallback RPI executor.
- Nightly automation upgrades — admission-aware morning digest with durability; manual-PR-only landing policy for evolve; fail-closed execute preflight for source mutation; blocker matrix + main CI baseline artifacts; daemon-submitted dream runs; L3 rehearsal scenario + operator runbook; scheduler install helper + updated runbook; PR digest generator from structured run state.
- Five-minute first-value install gate (PG1) — first-time-user journey is now a release-blocking install proof.
- Corpus snapshot/restore + freshness gate (D11) — durable corpus state with a CI-blocking freshness check.
- Hook commit guards —
feat(hooks)warns on commits with code-without-test (P2), missing sibling-pattern citation (P3), and missing fitness-delta (P4) in commit messages. - Goals additions —
code-drivenvsruntime-artifactsummary split;SKIPexit code 77 + flywheel-compounding dormant precondition;AffectsFilessidecar for the open-PR blocker matrix. - New CLI surfaces —
ao session spawn(template-driven session launch) andao feedback-loop --drain(clear unfed citation backlog, epicsoc-sx99W3.1). - Skills framework expansions —
skill-builder+skill-auditorpair (epicsoc-9bak, #237);ao quickstart+ demo surface forao scheduleandao daemon(epicsoc-sx99W2.2); tracer-bullet shape for theskills/domainubiquitous-language corpus; scoped evidence on/crankand/implementbead closures. - Schedule starter template — tracked stock starter at
docs/templates/. - Scripts —
corpus-stats.sh+ derivedPRODUCT.mdevidence (epicsoc-sx99W3.3);evolve-update-session-state.shderives session-state from cycle-history tail. securitydomain in eval-suite manifests —$defs/domainenum inschemas/eval-suite.v1.schema.jsonnow acceptssecurityalongside the existing eight domains. Paired updates land incli/internal/eval/coverage.go(DefaultCoverageDomains) andcli/cmd/ao/cobra_commands_test.go(evalCoverageDomains) so schema, production default, and test fixture stay in lock-step.ao eval coveragewill reportsecurityas a missing required domain until a security-domain manifest is authored.
- Three-layer product model adopted across surfaces — README,
PRODUCT.md,docs/index.md, and downstream positioning docs aligned to the three-layer (substrate / assurance / bookkeeping) framing with the software-factory + TSMC + in/on-the-loop framing. Eightdocs(positioning)commits cover the thesis polish, lineage cleanup, link tovs-compound-engineer.md, and the closure ofsoc-yjzp.9with empirical Δ=0. /releaseskill refactored under the size limit + non-HEAD cut support —skills/release/SKILL.mdand the Codex twin shrank from 545 lines to a 141-line flow index, with the detail extracted toreferences/release-workflow-detail.md. A newreferences/release-cut-and-bump.mddocuments therelease/v<ver>branch pattern for cutting at a non-HEAD SHA. (This is the commit that prepared the v2.40 cut itself.)- RPI lifecycle sharpened — criterion contract + isolation enforcement + daemon executor swap (
soc-bcrn, #255). The factory claim ledger reconciled via Wave 1A-D (soc-e4ulx, #264). - CI workflow path-filters — added path-filter conditions to 28 CI jobs (the remaining tail); bats path-filter wired so
.batschanges gate bats-test jobs;batsprints captured output on failure. - CI loop boundaries —
feat(ci): enforce inner/middle/outer loop boundaries(#230). - Codex parity drift —
feat(ci): wire check-codex-parity-drift as CI-blocking + pre-push gate (D7). - Skill consolidation — 12 standalone skills consolidated into
beads,review,doc, andresearch; external-corpus quality playbook slices absorbed in two waves. - Documentation reorg —
docs(claims)introduced a public evidence manifest for v2.39 README claims (PG4);docs(parity)updated four times;docs(cdlc),docs(readme),docs(eval),docs(release),docs(positioning)swept across the catalog. - Dependencies bumped —
chore(deps)updates formkdocs-materialv9.7.6,mkdocs-section-indexv0.3.12,mkdocs-literate-navv0.6.3,mkdocs-include-markdown-pluginv7,mkdocs-git-revision-date-localized-pluginv1.5.1,mkdocs-gen-filesv0.6.1,linkcheckerv10.6.0,pymdown-extensionsv10.21.2, Python 3.14,dorny/paths-filterv4.deps(go):pgregory.net/rapidminor bump incli/plus a go-minor-patch group bump (#268).
fix(ci)— registry non-determinism + pre-push goals-validate fallback; structural CI failures from cycles 45-47;registry.jsonknowledge_stores and schedules must match gitignored-state regen (epicsoc-sx99); refreshed registry generation; bumped CLI surface counts afterpatterns repair-filenamesaddition; close practice-provenance validator gaps;--single-passon pre-push-gate tests broken by--two-passdefault.fix(eval)— six fixes covering eval canary refresh, advisory job stabilization, run-record upload reliability, and miscellaneous workbench-suite drift.fix(hooks)— reconcile stale.agents/.gitignoredeny-all with parent allowlist (soc-rv5p, #263); provepre-pushtransmits refs (#254).fix(codex)— migrate deprecated hooks flag; remove dead Claude-specific references and runtime markers from the codex research skill.fix(cli/rpi)— three correctness fixes around RPI scoping and isolation enforcement.fix(daemon)— bound daemon RPI GasCity sessions; close unused-variable ingenerate-registry.sh; refresh CI registry after worker-spec additions.fix(flywheel-lifecycle)— survive sparse corpus in Stage 5.fix(next-work)— adddream-degradedto the source enum.fix(nightly),fix(harvest),fix(audit-truth),fix(evolve),fix(parity),fix(worktree),fix(coverage),fix(skills),fix(quickstart),fix(scripts),fix(docs),fix(gates),fix(pre-push),fix(heal,lint),fix(rpi)— a long tail of single-file fixes touching individual gates, scripts, and surfaces during the nightly close-loop cycles.- GitHub eval advisory setup — the
agentops-eval-advisoryjob now installs the deterministic canary toolchain (jq,ripgrep,bats,bd, andgocyclo) and initializes a disposable bd database before runningscripts/eval-agentops.sh --fast, matching the local environment expected by the public canaries.
- RPI loop pass consumption — 10
chore(rpi)consumption commits drove the practice-citation backfill epic through 12 passes, closing thecli/design pass and exhausting the schemas/hooks/evals pools en route to756/756 declared. - Tests —
test(cmd/ao): beads citation-verify functions, beads human-formatter functions,runBeads*graceful-degradation paths,TestSanitizeDaemonSkillInvokeArtifactName0%-coverage hole closed. - Refactors —
refactor(daemon)extracted routing-lane validators to drop cyclomatic complexity (two commits);refactor(dream)promoted probe shapes to the registry. - Codex artifact hashes regenerated after enum updates and after merges.
- Skill counts synced to 71 after the consolidation passes.
- Two nightly autonomous runs landed —
Nightly 2026-05-06(3 productive cycles, +1 code-driven goal, fitness 94.29 → 100.00, #235) andNightly 2026-05-07(3 productive cycles, +0 code-driven goals already 100, 2 audit-truth regressions fixed). - Drain open
next-work.jsonl— six PRs bundled into one branch (epicsoc-xlw8, #266).
- AgentOps daemon runtime - landed opt-in
agentopsdas the local control plane for durable jobs, queueing, worker execution, projections, health/readiness/status, event tailing, mutation tokens, and product soak proofs. New CLI surfaces includeao daemon run,ready,status,events tail,jobs list/show/submit/wait/cancel,service install, andsoak. - Daemon-backed workflows and scheduling - RPI, Dream, wiki/forge, and plans now have migration paths into the daemon via explicit daemon flags,
plans.projection, worker policies, and projections..agents/schedule.yaml,ao schedule add/list/run/remove, daemon--schedule-file, cron validation, backpressure controls, schedule mutation routes, and recurrence ledger events make recurring work a daemon-owned primitive. - Worker and factory substrate - added AgentWorker contracts, GasCity API/SSE adapters, CLI fallback workers, process cleanup, quarantine, Linux cgroup caps, routing policy guardrails, scoped mutation tokens, factory lifecycle projections, worktree ownership contracts, validation state, manual merge disposition, and yield ledgers.
- CLI command expansion - added
ao agents inspect/lint/doctor,ao skills check,ao scope,ao eval task/cleanup/suite/coverage/baseline-audit,ao pool reindex,ao watch, and daemon-backed plan synchronization surfaces. - Deterministic eval platform - added eval runtime adapters, scorecards, coverage reports, baseline A/B, context-packet A/B, retrieval/file-backed backends, public canary suites, and contract canaries.
- Dream and work-queue metadata - added finding-generator sidecars, aggregation, external-watchlist output, end-user coverage fitness gates, and first-class next-work
status,requires,dedup_key, and routing metadata. .agents/write-surface governance - catalogued repo memory write surfaces, added lint/smoke coverage, introduced the no-tracked-.agentspolicy, and added operator docs for working with runtime state.- Harvest and knowledge surfaces - harvest now recurses into nested artifact directories, emits real rig metadata, and gained native Go extraction support.
- CLI architecture refactor - reorganized large parts of the Go CLI around focused internal packages for daemon state, worker execution, GasCity, schedule parsing, eval, path resolution, lifecycle, harvest, OpenClaw, LLM wiki execution, safety, and quality checks.
- Daemon migration model - foreground RPI, Dream, plans, and wiki/forge paths remain compatible, while daemon submission/read paths move runtime ownership toward the durable ledger and rebuildable projections.
- Hook runtime and context flow - re-architected hooks around JIT context and a managed runtime backend, added Claude/Codex PreToolUse output parity checks, quieted Codex session-start behavior, and added edit/write hash-audit hooks.
- Path resolution - moved
.agentsand repo-state path logic toward shared Go/shell resolvers (cli/internal/paths,lib/ao-paths.sh) and migrated representative CLI commands and hooks to the new helpers. - Codex runtime packaging - refreshed Codex skills for GPT-5.5, aligned native plugin metadata with the marketplace schema, regenerated manifests and hashes, converted remaining skill references to
$skillnotation, tightened native hook installation, and reduced skill-catalog context footprint. - Release and CI governance - CI/local gates now cover daemon product proofs, contract canaries, eval baselines, command/test pairing, Codex runtime sections, pre-push wiring, release audit artifacts, Windows smoke, advisory policy, and nightly knowledge-cycle dedupe.
- Docs and operator contracts - added or expanded daemon, scheduling, control-plane, GasCity, OpenClaw, local compute routing, operator guide, and release governance documentation.
- Bootstrap behavior -
/bootstrapnow recommends installingbdinstead of attempting automatic installation.
- Daemon durability and state machines - fixed fsync propagation, orphan temp sweeps, snapshot directory sync, idempotency-key dedupe, queue cancellation, recurrence recomputation, projection deep-copy/nil-safety, claimed job queue depth, terminal projection precedence, and store/reconcile context cancellation.
- Daemon API and execution hardening - capped request bodies, bounded event limits, normalized wait timeout errors, rejected bad cursors and malformed schedule payloads, hardened wiki source-path containment and schedule name traversal, tolerated oversized ledger lines, preserved Dream log permissions, and routed Dream/wiki execution through the foreground supervisor.
- CLI hygiene - fixed JSON output validity, UTF-8 truncation, schedule prompt behavior under tests, command catalog drift, Cobra docs conformance, hidden command surface checks, command/test pairing, and temp-directory walk-up behavior.
- Hook/runtime drift - repaired Codex skill chaining defaults, native hook manifest installation, noisy session-start output, hook output schema, intent-echo bypass for team runners, lifecycle guard coverage, and pre-push hook coverage.
- Security and scanner false positives - closed the harvest TOCTOU path with
os.OpenRoot, excluded safe regexp literals from broad secret scans, split secret-regex construction so release gates do not flag their own patterns, and removed pipe-to-shell patterns from the ripgrep builder. - CI/nightly/release blockers - repaired eval advisory fixtures, baseline-audit drift-only behavior, Windows smoke paths, shellcheck/pre-push gate regressions, knowledge-cycle dormant-corpus handling, release audit validation, and the canary count drift found during final v2.39.0 validation.
- Strict Delegation Contract for
/rpi,/discovery, and/validation— top-level orchestrator skills now declare strict sub-skill delegation as the default. Each skill points to the new canonical referenceskills/shared/references/strict-delegation-contract.mdwhich documents the contract, anti-pattern rationalizations, and supported compression escapes (--quick,--fast-path,--no-retro,--no-forge,--skip-brainstorm,--no-scaffold,--no-behavioral,--allow-critical-deps). There is no--fullflag — strict delegation is always on. - Orchestrator Compression Anti-Pattern learning at
docs/learnings/orchestrator-compression-anti-pattern.md, surfaced through the orchestrator skill contracts. Includes detection phrases, corrective actions, and rationalizations to reject. - Orchestrator-owned step markers in
skills/crank/SKILL.md(STEP 3a.3, STEP 6.5 slop-scan, STEP 8.7) plus an "Inline Work Policy" footer documenting which steps are intentionally inline vs delegated. - MkDocs Material documentation site — Pages site rebuilt on MkDocs Material (slate dark palette). Skill catalog and CLI reference are generated at build from
skills/*/SKILL.mdandcli/docs/.mkdocs build --strictwired into the pre-push gate (Check 25a). New dedicated pages for hooks, schemas, and upgrading, plus an expanded glossary. Theme tuned to the agentops-showcase terracotta-on-near-black palette; landing page leads with the primary use case and a headline skills table; flywheel diagram ASCII art realigned; doctrine back-links added to12factoragentops.com. - Shell completion for enumerated-value flags —
ao <cmd> --<flag> <TAB>now suggests the valid values instead of falling back to file completion. Coversao --output(json/table/yaml),ao seed --template,ao goals init --template,ao inject --formatand--session-type. ao doctorstale-reference scan now coversskills-codex/*/SKILL.mdandskills/*/references/*.mdin addition to the primary skill docs, catching drift in Codex mirrors and skill reference content.- Nightly close-loop throughput alarm — the dream close-loop gate now fails loudly when
ingested > 0 && promoted == 0, replacing the silent zero-throughput mode that previously masked citation-gate deadlocks.
- Archived AO↔Olympus bridge integration: removed
docs/ol-bridge-contracts.md,docs/architecture/ao-olympus-ownership-matrix.md, MemRL policy contracts,skills/*/scripts/ol-*.sh,cli/cmd/ao/inject_ol_test.go, and associated CLI types (OLConstraint,gatherOLConstraints,.ol/directory collector). Olympus predecessor's useful patterns live on insideao. - Lowercase
docs/index.mdduplicate removed after the MkDocs migration canonicalized the landing page.
--no-lifecyclein/discoveryrenamed to--no-scaffoldfor semantic clarity — the flag controls STEP 4.5 scaffold auto-invocation only, not broader lifecycle checks.--no-lifecycleis honored as a deprecated alias through v2.40.0; when both flags are passed, they are equivalent. Other skills (/crank,/validation,/implement,/evolve) retain--no-lifecyclewith its existing lifecycle-skill-invocation semantics./discoveryflags table expanded:--autois now explicitly documented (was transitively honored but undocumented);--interactivescope clarified ("research + plan gates, not pre-mortem")./validationflags table expanded:--complexity=<level>syntax formalized to match/rpiand/discovery;--interactivescope documented./rpi--interactiveflag scope note added: applies to discovery (research + plan) and validation (Gate 1, Gate 2); does NOT override pre-mortem or vibe council autonomy.- ASCII fast-path performance sweep across rune-aware truncation call sites in
cli/(TruncateText,TruncateRunes,truncateForError, plus goals/pool/search/rpi/parser call sites) — ASCII inputs now skip the full UTF-8 rune scan. - Compile and overnight internals refactored —
runCompilesplit into phase + preflight helpers; article scan, inbound count, and prune extracted fromrepair; dream packet corroboration split per source epic; dream yield emptiness guard extracted into a dedicated helper. No behavior change; lower cyclomatic complexity and tighter test surfaces. - Skills-codex DAG bodies converted to
$skillnotation for the Codex runtime. - GitHub Actions bumped —
actions/upload-pages-artifact 3→4,actions/deploy-pages 4→5,actions/configure-pages 5→6, plus docs.yml workflow action versions aligned. - Precommit hook prefers local Go when available.
- Skills backfill pass — docs, validators, and lint synced across all skills; Codex drift surface reset to no-op.
- Orchestrator compression vulnerability — a live compression was observed 2026-04-19 where
/rpiwas invoked but phases were inlined instead of delegated. This release documents the anti-pattern (forged learning + loud skill text), scaffolds future enforcement (shared contract reference used by all 6 orchestrator skills), and explicitly defers runtime hook enforcement to a follow-up initiative. It does not mechanically prevent compression yet — the durable fix depends onao injectsurfacing the forged learning on future session starts. See.agents/research/2026-04-19-rpi-skill-dag-audit.mdfor the audit and.agents/plans/2026-04-19-rpi-dag-hardening.mdfor the remediation plan. - Close-loop promotion deadlock —
flywheelclose-loop auto-promotion and the loop-dominance signal are unblocked; citation-gate cycles no longer silently zero throughput. - Overnight findings router now emits schema-compliant
next-workv1.3 enums — validclaim_status=available(waspending), severity collapsed tohighforcriticalandblockerinputs — with a build-time guard that fails on future drift. - Quality stale-refs scan skips rename-doc lines so it no longer false-positives on deliberate rename notes.
- Release and compile gates —
go-complexity-ceilingself-heals a missinggocyclo;compile-*gates fall back to Dream defrag preview when the primary path is unavailable. - Proof-run Phase 2 calls
pool ingestbeforeclose-loopso the downstream stage has input to consume. - Hooks —
git-worker-guardnarrowed to avoid false blocks on selective flags; test-hook harness tolerates environments without a locally builtao. - Scripts —
goal-staleness,pillar-coverage,goal-quality, andbootstrap-maturitynow skip cleanly after theGOALS.yaml → GOALS.mdmigration and preserve existing JSONL maturity with compact output. - CI — resolved 7 failures from the MkDocs rebuild, committed the forged-learning artifacts, and regenerated the codex shared hash.
- Docs markdownlint — unresolved
+ 9 findings).continuation in the cross-disk harvest plan now reads as prose instead of tripping MD004.
- Swarm evidence validation — AgentOps now ships a swarm-evidence schema and validator, and wires that proof surface into validation and release gates.
- Lead-only worker git guard — worker sessions now have an explicit lead-only git guard in the hook chain, reducing accidental write authority in multi-agent runs.
- Compile and harvest operator controls —
ao compileadds runtime preference plus--resetand--repaircontrols, while harvest now reports excluded low-confidence candidates and top near-misses.
- Release and pre-push validation — local release, pre-push, and command coverage gates now validate more of the hook, evidence, and Codex runtime surface before publish.
- Codex/runtime artifacts and docs — compile, evolve, post-mortem, swarm, and related runtime docs and artifacts were decomposed and synchronized to better match shipped behavior.
- Flywheel backlog bookkeeping — next-work aggregates, consumed markers, and enum normalization were cleaned up so carry-forward work is recorded consistently.
- Pre-mortem gate ambiguity — the crank pre-mortem gate now denies ambiguous state by default instead of failing open.
- CLI and shell reliability edges —
ao rpi serve --run-idnow accepts legacy 8-hex IDs,ao mine --dry-runemits a single clean JSON payload, and bash invocations are sanitized to bypass unsafe shell aliases. - Compile, harvest, and release drift — compile repair defaults, malformed frontmatter salvage, YAML parse error surfacing, CI fixture drift, shellcheck drift, and Codex artifact metadata drift were corrected.
- Dream morning packets — Dream can now emit ranked morning work packets with evidence, target files, exact follow-up commands, and queue/bead handoff metadata.
- Dream yield telemetry and long-haul corroboration — overnight reports now record packet-confidence telemetry and can trigger a bounded long-haul corroboration pass when the first pass produces weak morning output.
- Dream decision flow — overnight runs now prefer cheaper evidence corroboration before slower council fan-out, so strong runs stay short and extended runtime is reserved for genuinely weak output.
- Headless Claude Dream council — Dream now uses Claude's working JSON output contract for headless council runs and normalizes the returned envelope before validation.
- Dream close-loop and report surfaces — overnight runs now write real close-loop callbacks and post-loop report artifacts instead of leaving placeholder
pendingsteps. - Retrieval ratchet release gate fallback — the retrieval-quality release check now falls back to checked-in eval data when a local manifest is absent.
- Windows install and smoke coverage —
scripts/install-ao.ps1adds a first-class Windows install path, and the blockingwindows-smokegate exercises PowerShell install, localao doctor, and Windows-sensitive Go packages. - Compile command —
ao compilemakes knowledge compilation a first-class CLI surface with docs and tests. - Local LLM forge pipeline —
ao forgecan now redact, summarize, structurally review, and queue transcript-derived wiki pages with Dream worker integration. - Dream curator and evolve sub-cycle — Dream gained a local curator adapter plus
ao evolve --dream-first|--dream-only, allowing overnight knowledge passes to feed the daytime improvement loop. .agentswiki surfaces — INDEX, LOG, wiki directories, and search integration formalize.agents/as a Karpathy-style knowledge wiki with index-first navigation.- Operational quality surfaces — beads audit/cluster commands, swarm preflight advice, status quality signals, retrieval eval queries, and a retrieval-quality CI ratchet broaden release-time proof.
- Knowledge scoring and search behavior — inject now deduplicates by content hash, boosts indexed pages, weights stability, and search can pull Dream vault and wiki sources with stronger local recall.
- Overnight and RPI internals — overnight, lifecycle, search, inject, harvest, and RPI flows were decomposed into smaller helpers while tightening proof paths, mixed-mode provenance, and worktree cleanup.
- Public framing and contributor docs — README, philosophy, planning/post-mortem docs, and reference surfaces now better match the context-compiler and operational-layer story.
- Windows overnight liveness — Windows process checks no longer rely on Unix
signal(0)semantics. - Dream RunLoop status invariants — live-tree hash coverage now exercises every terminal RunLoop status, and
degradedreflects the current rollback semantics. - Release retag safety — release tooling now preserves annotated tags, validates audit artifact manifests and refs, and cancels stale reruns before duplicate publish attempts.
- Post-mortem and closure audits — metadata links, evidence-only closure packets, parser-path handling, and closure packet evidence modes were normalized.
- Codex and runtime reliability — same-thread lifecycle restart, root-scoped fallback reads, JSON config writes, bridge contract validation, and next-work proof-path handling were hardened.
- Evolve operator command —
ao evolvenow exposes the v2 autonomous improvement loop directly in the CLI, including--max-cycles,--queue,--beads-only,--quality,--compile, and strict-quality passthrough flags. - Autodev program contract — root
PROGRAM.mdgives evolve/autodev a repo-local operating contract with mutable and immutable scope, validation commands, escalation policy, and stop conditions. - Beads stale-scope tooling —
ao beads verify|lint|harvestadds first-class stale-citation checks for bead-driven planning and RPI recovery. - RPI discovery artifacts — RPI can now persist and consume discovery artifacts, with tests and docs covering the
--discovery-artifactpath. - Dream RunLoop invariant coverage —
TestRunLoop_LiveTreeHashInvariant_AllStatuseslocks theIsCorpusCompounded()and live-tree mutation invariant across deterministically reproducible terminal statuses. - Dream failed-summary contract coverage — regression tests now lock the
finalizeOvernightSummarycontract for MEASURE consecutive-failure halts and persisted iteration history. - Dream operator mode —
ao overnight start|run|report|setupadds a private overnight lane with shareddream.*config, keep-awake defaults, scheduler/bootstrap guidance, council-ready runner packets, and DreamScape-style morning summaries - Nightly live retrieval proof — the dream-cycle now runs
ao retrieval-bench --live --json, emits retrieval proof in nightly summaries, and keeps a visible artifact trail for flywheel health - Pattern-to-skill drafts — repeated patterns can now generate review-only skill drafts under
.agents/skill-drafts/during flywheel close-loop - Fresh-repo onboarding welcome — new session-start routing helps first-time repos enter discovery, implementation, or validation without needing the full RPI lane first
- Docs-site and contribution proof surfaces — GitHub Pages navigation, comparison pages, behavioral-discipline guidance, strategic-doc validation patterns, and a first-skill guide expand the public proof surface
- RPI wave recovery integrated — recovered RPI wave work landed across Dream, council, stale-scope planning, discovery artifacts, CI hardening, and Codex runtime surfaces.
- Council
--mixedstrict contract documented —skills/council/references/cli-spawning.mddocuments that/council --mixedrequires Codex CLI and emits a hard error instead of silently falling back to Claude-only. - Plan and pre-mortem skill bodies decomposed — focused reference files now carry the detailed pre-decomposition, scope-mode, mandatory-check, output, wave-matrix, and task-creation guidance while keeping the top-level skills within lint budgets.
- Bead-input pre-flight wired into planning skills —
/planand/pre-morteminvokeao beads verify <bead-id>for full-complexity, aged, or prior-session bead inputs before decomposition or validation. - Operational-layer framing — README, onboarding, docs, comparisons, and linked surfaces now consistently explain AgentOps as bookkeeping, validation, primitives, and flows for coding agents
- Dream runtime positioning — the public GitHub nightly is now documented as a proof harness, while
ao overnightis documented as the private local compounding engine - Codex default path — native hooks, install copy, runtime smoke coverage, and checked-in Codex artifacts are aligned around the native-plugin path on supported Codex versions
- Validation guidance — behavioral-discipline and strategic-doc review are now first-class references alongside code review and runtime validation
- Windows Codex installer — Codex installation now has a Windows path instead of assuming Unix shell behavior.
- golangci-lint v2 contract — the local lint wrapper and CI configuration now pin the v2 behavior expected by the repository.
- security-toolchain-gate CI — deterministic fixture generation in
cli/internal/overnight/fixture/gen_fixture.gois annotated as a non-cryptographic seeded-random use, avoiding a false-positive semgrep blocker. - Recovered RPI validation blockers — validation drift from the recovered RPI wave was cleared before retagging the release.
- Stale-scope reference placement — shared stale-scope validation guidance now lives under
skills/shared/references/soheal.sh --strictcan resolve it consistently. - Release and CI drift — resolved docs-site Liquid/frontmatter issues, headless runtime smoke portability problems, pre-push shim test drift, and compile-skill headless command drift caught during release prep
- Codex install and artifact drift — fixed stale slash-command references, refreshed checked-in artifact metadata, added a Codex compile wrapper, and corrected plugin/marketplace mismatches exercised by smoke coverage
- Runtime proof stability — promoted Codex runtime smoke into the blocking smoke path and fixed related shellcheck and install-surface rough edges
- DevOps-rooted tagline — public framing no longer leads with the old DevOps-layer tagline; the Three Ways lineage remains supporting doctrine instead of the category label
- Codex native hooks — AgentOps hooks now install natively into Codex CLI v0.115.0+ via
~/.codex/hooks.json; 8 hooks wired (session-start, inject, flywheel-close, prompt-nudge, quality-signals, go-test-precommit, commit-review, ratchet-advance); installer enables thehooksfeature flag, migrates deprecatedcodex_hooksconfigs, and upgrades from hookless fallback to native hook runtime - Knowledge compiler skill — renamed athena →
/compilewith Karpathy-style incremental compilation, pluggable LLM backend (AGENTOPS_COMPILE_RUNTIME=ollama|claude), interlinked markdown wiki output at.agents/compiled/ - App struct dependency injection —
Appstruct carriesExecCommand,LookPath,RandReader,Stdout,Stderrseams; gc bridge, events, executor, context relevance, tracker health, and stream modules accept injected dependencies instead of mutable package-level vars - Test shuffle in CI —
-shuffle=onadded tovalidate.ymlandMakefiletest targets, exposing and fixing 6 ordering-dependent tests (cobra flag leaks, maturity var leaks, env var leaks)
- CLI internal extraction (waves 5-13) — business logic extracted from
cmd/aomonolith into 15internal/domain packages (rpi,search,context,quality,goals,lifecycle,bridge,forge,mine,plans,knowledge,storage,pool,taxonomy,worker) using Options struct pattern for dependency injection - Goals test migration — 7 goals test files moved from
cmd/aotointernal/goalsas external test package (goals_test) witht.Parallel()and directgoals.Run*()calls replacing cobra command wiring - Test isolation —
resetCommandStatenow saves/restores 10 maturity globals;resetFlagChangesRecursiveresets flag values to defaults; RPILoop and toolchain tests clearAGENTOPS_RPI_RUNTIME*env vars viat.Setenv
- Defrag test flag leak —
TestDefragOutputDirFlagusedcmd.Flags().Lookup("output")which matched the root persistent--outputflag; changed tocmd.LocalFlags().Lookup("output") - Goroutine leak false positive —
TestRunGoals_GoroutineLeakusedgoleak.VerifyNonewhich caught goroutines from parallel tests; switched togoleak.IgnoreCurrent()to only detect leaks within the test itself - Secret scan false positives — excluded
.gc/directory andGetenv/os.Environpatterns from secret pattern scan - Codex skill validation — added
output_contractas valid schema key,cross-vendor/knowledgeas valid tiers, fixed$/prefix in codex forge/post-mortem/scenario skills - Scenario CLI snippets — replaced non-existent
--source/--scopeflags with valid--statusvariants
- Coverage percentage CI gates — removed
coverage-ratchetjob,check-cmdao-coverage-floor.sh,.coverage-baseline.json, and associated BATS tests; percentage gates blocked CI during architectural refactors without catching bugs fire.go— FIRE loop (find-ignite-reap-escalate) superseded by gc sling + bead dispatch;formatAgehelper moved toinject_predecessor.gorpi_workers.go— per-worker health display superseded by gc agent health patrol;ao rpi workerssubcommand removed from CLI and docs
- Stage 4 Behavioral Validation — new validation tier between council/vibe and production:
- Holdout scenarios stored in
.agents/holdout/with PreToolUse isolation hook preventing implementing agents from seeing evaluation criteria - Satisfaction scoring (0.0-1.0 probabilistic) in verdict schema v4, replacing boolean-only PASS/FAIL
- Agent-built behavioral specs generated during
/implementStep 5c /scenarioskill for authoring and managing holdout scenariosao scenario init|list|validateCLI commands (4 subcommands, 11 tests)- STEP 1.8 in
/validationpipeline evaluating holdout scenarios + agent specs schemas/scenario.v1.schema.jsondefining the holdout scenario format
- Holdout scenarios stored in
- Flywheel gate command —
ao flywheel gatechecks readiness for retrieval-expansion work (research closure, rho threshold, holdout precision@K) - Citation confidence scoring —
citationEventIsHighConfidencewith bucketed confidence (0/0.5/0.7/0.9) gates MemRL rewards on match quality - Retrieval bench refactor — train/holdout splits, section-aware scoring (
scoreBenchSections), manifest-based benchmark cases - Proof-backed next-work visibility —
classifyNextWorkCompletionProofunifies completed-run, execution-packet, and evidence-only-closure proof types; context explain and stigmergic packet now report proof-backed suppressions - Three-gap contract proof gates — lifecycle gap mapping gates added to GOALS.md
- Cross-vendor execution —
--mixedflag for Claude + Codex council judges - Gas City bridge — gc as default executor for RPI phase execution with L1-L3 tests
- 149 L2 integration tests — AI-native test shape ("L2 first, L1 always") validated at scale; coverage floor raised 78.8% → 81.0%
- Test coverage hardening — GPG commit-signing fixes, root-skip guards for containerized CI, 350+ lines of vibecheck detector/metrics tests, maturity.go empty-content bugfix
- Codex parity hook —
codex-parity-warn.shnow supports opt-in blocking mode viaAGENTOPS_CODEX_PARITY_BLOCK=1(exit 2 instead of advisory) - 12-factor doctrine — compressed from 474 to 114 lines, reframed as supporting lens rather than product definition
- Skill count — 65 → 66 (added
/scenario) - Research skill — now persists reusable findings to
.agents/findings/registry.jsonlwith finding-compiler refresh - Closure integrity audit — accepts durable closure packets without scoped-file sections as valid evidence
- Proof-backed legacy entries —
shouldSkipLegacyFailedEntryusesCompletionEvidencefield (proof-only, no heuristic fallback) readQueueEntries— returns all non-consumed entries; proof filtering is downstream viashouldSkipLegacyFailedEntry
- 6 CI failure categories resolved in one commit (f1b83b25)
- Cobra test registration —
scenarioandflywheel gateadded to expectedCmds - Citation feedback test — assertion corrected for recorded confidence preference (0.5 not 0.7)
- RPI hardening — UAT version pre-flight, goals history filter, proof-backed suppression, fail-closed gates, cross-epic handoff contamination, bare ag- prefix guard
- Branch consolidation — 10 stale Codex branches analyzed, cherry-picked (9 commits, ~3,500 lines), and deleted; 25 orphaned worktrees pruned
- git rerere enabled — conflict resolution memory for future merges
- Backlog hygiene gates — added
bd-audit.sh,bd-cluster.sh, and Crank/Codex guidance for cleaning stale or mergeable beads before execution - Retrieval benchmarking and global scope — added
ao retrieval-bench, benchmark corpora,--live,--global, and nightly IR regression coverage /red-teamadversarial validation — added a persona-based validation skill plus checked-in Codex runtime artifacts- Software factory operator lane — added a CLI/operator surface and Claude factory startup routing for software-factory workflows
- Flywheel maintenance utilities — added global garbage purge tooling and nightly retrieval benchmarking for knowledge quality tracking
- Release policy — removed the enforced release cadence gate so releases no longer block on a minimum wait between tags
- Knowledge operator surfaces — plan and validation now wire knowledge operator surfaces directly into execution flow
- Proof and runtime docs — goals, RPI docs, and contributor guidance now reflect the expanded proof surfaces and hookless runtime behavior
- Codex artifact parity — restored checked-in Codex parity for red-team and cleaned Codex runtime metadata/frontmatter drift across crank, forge, post-mortem, release, and swarm artifacts
- Retrieval quality — replaced exact-substring filtering with token-level matching and tuned penalty, deduplication, and OR-fallback behavior
- Harvest metadata preservation — promotion now preserves source metadata and fills missing maturity, utility, and type fields safely
- Release tooling — release artifact directories are created safely and audit artifacts now resolve against release tag names
- Documentation and link drift — repaired the post-mortem Codex link and aligned runtime docs around the newer startup and lifecycle flows
- Knowledge activation skill — new
/knowledge-activationskill and CLI surfaces for activating cross-domain knowledge at runtime, with operator surface consumption and ranked intelligence context - Session intelligence engine — complete runtime engine with explainability, ranked context assembly, and trust policy enforcement
- Runtime selection for
ao rpi serve— serve now supports explicit runtime selection for Claude and Codex execution modes - Quality signals hook — new
quality-signals.shhook with test coverage for session quality telemetry - Pre-push gate expansion — 9 checks migrated from CI-only to the local pre-push gate for faster feedback
- Inject stability warnings and status dashboard — closed 3 harvest items with signal tests and dashboard improvements
- README refresh — product-minded rewrite with gain-framing and Strunk-style prose fixes
- Philosophy doc — new
docs/philosophy.mdand observations section added to README - Documentation alignment — repo front doors and codex artifact guidance unified across entry points
- Claude Code architecture lessons — retry budgets, stability flags, quality signals, and orchestration patterns applied to skills
- Homebrew formula — updated to v2.31.0 with pre-built binaries
- Post-mortem closure integrity — normalized file parsing for closure integrity audits
- CI reliability — resolved CI failures across codex refs, test pairing, hook coverage, worktree handling, docs parity, hook portability, and codex lifecycle
- Lookup nested scanning —
ao lookupnow scans nested global knowledge directories correctly - Pre-push test stubs — added test stubs for new pre-push checks, skip non-shell in shellcheck
- Bumped
codecov/codecov-actionfrom 5 to 6 - Bumped
DavidAnson/markdownlint-cli2-actionfrom 22 to 23
- 9 lifecycle skills — bootstrap, deps, design, harvest, perf, refactor, review, scaffold, and test skills wired into RPI with auto-invocation and mechanical gates
ao harvest— cross-rig knowledge consolidation extracts and catalogs learnings from sibling crew workspacesao context packet— inspect stigmergic context packets for debugging inter-session handoff state- Hook runtime contract — formal Claude/Codex/manual event mapping with runtime-aware hook tooling
- Evidence-driven skill enrichment — production meta-knowledge, anti-patterns, flywheel metrics, and normalization defect detection baked into 9 skill reference files
- Research provenance — pending learnings now carry full research provenance for discoverability and citation tracking
- Context declarations — inject, provenance, and rpi skills declare their context requirements explicitly
- Goals and product output templates —
/goalsand/productproduce evidence-backed structured output
- Three-gap context lifecycle contract — README, PRODUCT.md, positioning docs, and operational guides reframed around the context lifecycle model
- Dual-runtime hook documentation — runtime modes table and troubleshooting updated for Claude + Codex hook coexistence
- CI reliability — resolved 4 pre-existing CI failures, restored headless runtime preflight, repaired codex parity drift checks
ao lookupretrieval — fixed retrieval gaps that caused lookup to return no results- Embedded sync — using-agentops SKILL.md and
.agents/.gitignorenow written correctly on first session start - Closure integrity — 24h grace window for close-before-commit evidence, normalized file parsing
- Skill lint compliance — vibe, post-mortem, crank, and plan skills trimmed or restructured to stay under 800-line limit
- Codex tool naming — added CLAUDE_TOOL_NAMING rule and fixed 5 Claude-era tool references in codex skills
- ASCII diagram consistency — aligned box-drawing characters across 23 documentation files
- Fork exhaustion prevention — replaced jq with awk in validate-go-fast to prevent fork bombs on large repos
- Codex hookless lifecycle support —
ao codexruntime commands, lifecycle fallback, and Codex skill orchestration now cover hookless sessions end to end - PROGRAM.md autodev contract — Added a first-class
PROGRAM.mdcontract for autodev flows and taught/evolveand related RPI paths to use it - Long-running RPI artifact visibility — Mission control now exposes run artifacts and evaluator output so long-running RPI sessions are replayable and easier to inspect
- Codex runtime maintenance flow — Refreshed Codex bundle hashes, lifecycle guards, runtime docs, and release validation coverage around the expanded Codex execution path
- Codex RPI scoping and closeout — Tightened objective scope, epic scope, closeout ownership, and validation gaps in the Codex RPI lifecycle
- Release gate reliability — Restored headless runtime coverage, runtime-aware Claude inventory checks, and release-gate coherence validation
- Reverse-engineer repo hygiene — Repo-mode reverse engineer now ignores generated and temp trees when identifying CLI and module surfaces
- Model cost tiers and config writes —
ao configcan now assign per-agent models by cost tier and persist repo configuration changes directly - Search brokerage over session history and repo knowledge —
ao searchnow wraps upstreamcassresults with repo-local AgentOps artifacts by default - Reviewer and post-mortem reference packs — Added model-routing, iterative-retrieval, confidence-scoring, write-time-quality, and conflict-recovery guidance across council, research, swarm, vibe, compile, and related skills
- Competitive comparison and CLI docs — Refreshed comparison docs, release smoke coverage, and command documentation around the expanded search/config surface
- Flywheel proof and citation loop — Added deterministic proof fixtures, preserved exact research provenance, and made citation feedback artifact-specific so flywheel health reflects real closure state
- Search alignment with forged session history — Search now stays aligned with forged session artifacts and fallback behavior
- Hook-launched validation — Pre-push and release gates now isolate inherited git env/stdin correctly and cover newer hook scripts in integration tests
- Codex council profile parity — Source and checked-in Codex council docs are back in sync for the shared profile contract
- Node repair operator — Crank now classifies task failures as RETRY (transient), DECOMPOSE (too complex), or PRUNE (blocked) with budget-controlled recovery
- Knowledge refresh auto-trigger — Lightweight compile defrag runs automatically at session end via new SessionEnd hook
- Configurable review agents — Project-level
.agents/reviewer-config.mdcontrols which judge perspectives council and vibe spawn - Three-tier plan detail scaling — Plan auto-selects Minimal, Standard, or Deep templates based on issue count and complexity
- Adversarial ideation — Brainstorm Phase 3b stress-tests each approach with four red-team questions before user selection
- Crank SKILL.md line limit — Consolidated duplicate References sections to stay under 800-line skill lint limit
- Codex skill parity — Synced all five competitive features to skills-codex with reference file copies
- Flywheel golden signals always shown — Golden signals were gated behind
--goldenflag, causingao flywheel statusto report "COMPOUNDING" while the hidden golden signals analysis showed "accumulating". Golden signals now compute and display by default.
- Flywheel golden signals — Four derived health indicators (velocity trend, citation pipeline, research closure, reuse concentration) that distinguish knowledge compounding from noise accumulation; accessible via
ao flywheel status --golden - Forge-to-pool bridge — Forge auto-writes pending learnings as markdown to
.agents/knowledge/pending/for close-loop pool ingestion - SessionStart citation priming —
ao lookupwired into SessionStart hook to close the citation gap between inject and session context - Skill catalog quality — Improved descriptions, extraction patterns, and reference linking across skill catalog
.agents/.gitignorescope — Replaced broad!*/pattern with explicit subdirectory list to prevent accidental tracking- Codex runtime skill parity — Hardened Codex runtime skill discovery and validation
- Codex install smoke tests — Fixed test assertions for install path edge cases
- CLI reference docs — Regenerated with updated date stamps
- RPI stops after Phase 2 — Restructured rpi, discovery, and validation orchestrator skills as compact DAGs with execution sequence in a single code block; eliminates LLM stopping between phases due to
###section headings acting as natural breakpoints - Test grep patterns for DAG headings — Updated
test-tuning-defaults.shto match new complexity-scaled gate headings after DAG restructure
- Goals reimagined — GOALS.md rebuilt from first principles with fitness gate fixes
- README progressive disclosure — Lead with moats, collapse detail into expandable sections
- CLI reference docs — Regenerated with updated date stamps
- Doctor + findings helpers — Added CLI test coverage for extracted helpers
- BF6–BF9 test pyramid levels — Regression (bug-specific replay), Performance/Benchmark, Backward Compatibility, and Security (in-test) bug-finding levels with language-specific patterns for Go and Python
- Test pyramid decision tree expansion — 4 new routing questions for BF6–BF9 in the "When to Use" guide
- RPI phase mapping for BF6–BF9 — Bug fix → BF6 mandatory, hot-path → BF7 benchmark, format change → BF8 compat fixture, secrets → BF9 redaction tests
regen-codex-hashes.sh— Manifest hash regeneration script for Codex skill maintenance
- Go standards — Added benchmark tests (BF7), backward compat with
testdata/compat/(BF8), regression test naming convention (BF6), security tests for path traversal (BF9) - Python standards — Added Hypothesis property-based testing (BF1),
pytest-benchmarkpatterns (BF7), backward compat with parametrized fixtures (BF8), regression test naming (BF6), secrets redaction tests (BF9) - Coverage assessment template — Extended BF pyramid table from BF1–BF5 to BF1–BF9
- Codex skill audit — 60+ findings fixed across all 54 Codex skills; removed orphaned
claude-code-latest-features.mdandclaude-cli-verified-commands.mdreferences - Skill lint warnings — Resolved all warnings in crank, rpi, recover skills
- README skill references — Corrected broken references and linked orphaned templates
- Skill linter refs — Fixed directory reference and backtick formatting in reverse-engineer-rpi
- CHANGELOG sync hook — Replaced broken awk extraction with sed; awk failed on em-dash UTF-8 content producing header-only syncs
- Plugin version parity — Added pre-commit check that warns when
.claude-plugin/manifest versions don't match the release version
- Codex BF pyramid parity — Synced BF1/BF2/BF4 bug-finding level selection into skills-codex implement, post-mortem, and validation skills
- Codex Claude backend cross-contamination — Removed orphaned
backend-claude-teams.mdfiles (Claude primitives: TeamCreate, SendMessage) from 4 Codex skills (council, research, shared, swarm) - Dead converter rule — Removed stale sed substitution for
backend-claude-teams.mdrename in converter script - Swarm reference integrity — Added Reference Documents section to swarm SKILL.md; updated validate.sh to check only Codex-native backend references
- L0–L7 test pyramid standard — Shared reference doc (
standards/references/test-pyramid.md) defining 8 test levels, agent autonomy boundaries (L0–L3 autonomous, L4+ human-guided), and RPI phase mapping - Test pyramid integration across RPI lifecycle — Discovery identifies test levels, plan classifies tests by level, pre-mortem validates coverage, implement selects TDD level, crank carries
test_levelsmetadata, validation audits coverage, post-mortem reports gaps - RPI autonomous execution enforcement — Three-Phase Rule mandates discovery → implementation → validation without human interruption; anti-patterns table documents 7 failure modes
- Evolve autonomous execution enforcement — Each cycle runs a complete 3-phase
/rpi --auto; anti-patterns table documents 6 failure modes; large work decomposed into sub-RPI cycles - Codex skill standard — New
standards/references/codex-skill.mdwith tool mapping, prohibited primitives, two-phase validation, DAG-first traversal, and prompt constraint boundaries - Codex-native overrides — Durable overrides for crank, swarm, council that survive regeneration
- DAG-based Codex smoke test —
scripts/smoke-test-codex-skills.shvalidates 54 skills with dependency-ordered traversal - Codex skill API contract —
docs/contracts/codex-skill-api.mdwith conformance validator - Output contract declarations —
output_contractfield on council, vibe, pre-mortem, research skills with canonical finding-item schema
- Codex converter rewrite — Strips Claude primitives instead of mapping to unavailable tools; rewrites reference files through
codex_rewrite_text - CI pipeline — Removed codex skill parity check (skills-codex/ now manually maintained); fixed shellcheck and embedded sync issues
- Converter primitive stripping — Task primitives (TaskCreate, TeamCreate, SendMessage) properly stripped instead of mapped to non-existent Codex equivalents
- Embedded hook sync — Added missing
test-pyramid.mdandcodex-skill.mdto CLI embedded references - ShellCheck SC1125 — Fixed em-dash in shellcheck disable directive in smoke test script
- Skill line limits — Moved verbose autonomy rules to reference files to stay under tier-specific line budgets
- Error & rescue map template — Pre-mortem Step 2.5 with 3 worked examples (HTTP, database, LLM)
- Scope mode selection — Pre-mortem Step 1.6 with 3-mode framework (Expand/Hold/Reduce) and auto-detection
- Temporal interrogation — Pre-mortem Step 2.4 walks implementation timeline (hour 1/2/4/6+) for time-dependent risks
- Prediction tracking — Pre-mortem findings get unique IDs (
pm-YYYYMMDD-NNN) correlated through vibe and post-mortem - Finding classification — Vibe separates CRITICAL (blocks ship) from INFORMATIONAL findings
- Suppression framework — Vibe loads default + project-level suppression patterns for known false positives
- Domain-specific checklists — Standards skill extended with SQL safety, LLM trust boundary, and race condition checklists, auto-loaded by vibe
- RPI session streak tracking — Post-mortem Step 1.5 shows consecutive session days and verdict history
- Persistent retro history — Post-mortem Step 4.8 writes structured JSON summaries to
.agents/retro/for cross-epic trend analysis - Prediction accuracy scoring — Post-mortem Step 3.5 scores HIT/MISS/SURPRISE against pre-mortem predictions
- Commit split advisor — PR-prep Phase 4.5 suggests bisectable commit ordering (suggestion-only)
- Council finding auto-extraction — Significant findings from WARN/FAIL verdicts staged for flywheel consumption
- Post-mortem examples condensed — Verbose examples replaced with concise 4-mode summary to stay under skill line limit
- Resolved all golangci-lint quality findings
- Synced embedded standards after skill audit fixes
- Synced Codex bundle after skill audit fixes
- Resolved audit findings across council, vibe, standards skills
- Discovery and validation phase orchestrators — New
/discoveryand/validationskills decompose the RPI lifecycle into independently invocable phases (research+plan+pre-mortem and vibe+post-mortem) - Stigmergic packet scorecard — Ranked scoring for flywheel knowledge packets so higher-utility learnings surface first
- Pinned work queue —
/evolvegains a pinned work queue with blocker auto-resolution for directed improvement loops - Per-package coverage ratchet — Pre-push gate enforces per-package coverage baselines that only move upward
- Fast pre-push mode —
--fastflag for diff-based conditional checks, skipping unchanged packages - Standards auto-loading — Go and Python coding standards injected
automatically into
/crankand/swarmworkers - 271 test functions — Four internal packages (
pool,ratchet,resolver,storage) brought to 100% coverage
- README restructured — Extracted reference material into dedicated docs, reducing README from 679 to 472 lines
- RPI skill refactored —
/rpinow delegates to/discoveryand/validationphase orchestrators instead of inlining all phases - Go and Python test conventions — Canonical standards enriched with assertion quality rules, naming conventions, and table-driven test guidance
- Documentation alignment — Lifecycle, flywheel, primitive chain, and positioning docs updated to reflect current architecture
- Goal runner deadlock — Fixed goroutine deadlock in goal runner and added job timeouts to prevent stalls
- 17 CLI bugs from deep audit — Addressed goroutine leaks, race conditions, panics, buffer overflows, and nil-check inconsistencies
- Session close reliability — Resolved pre-existing session_close issues surfaced by vibe council review
- ~50 zero-assertion tests — Upgraded smoke tests from no-op to behavioral assertions across cmd/ao and internal packages
- Test file hygiene — Merged
_extra_test.goandcov*_test.gofiles into canonical<source>_test.gonames - CI stability — FIFO test skip on Linux, embedded skill sync, coverage ceiling adjustments, crank SKILL.md trimmed below 800-line limit
- Auto-extract quality gate — Added quality gate to prevent low-fidelity auto-extracted learnings from entering the knowledge store
- Repo-native redteam harness — Added a packaged redteam pack and prompt
runner to
security-suitefor repeatable repository-local security exercises - Findings management commands — Added CLI commands for listing and managing saved findings from the terminal
- Closed-loop prevention validation — Completed the end-to-end finding compiler and prevention-ratchet validation path so saved findings feed back into earlier planning and task validation more reliably
- Runtime contract parity — Localized shared Claude runtime reference packs into the source skills and regenerated Codex artifacts so source and generated bundles stay aligned
- Finding metadata injection — Exposed finding metadata consistently in inject output and JSON integrations after the merged findings work landed
- Release gate regressions — Restored goals/package coverage, learning coherence, and hook-fixture isolation so the local release gate matches the shipped tree again
- Finding registry — Council findings are saved to a persistent registry and automatically fed back into planning and validation, so the same class of bug is caught earlier next time
- Repo execution profiles —
.repo-execution-profile.jsonlets skills and runtimes adapt to each repository's validation gates, startup reads, and done-criteria - Headless team backend — Multi-agent workflows can run non-interactively (e.g. in CI) with structured JSON output and automated validation
- Codex and embedded artifacts — Synced generated Codex bundles, embedded standards references, and install artifacts after merging branch work
- Validation feedback capture — Recorded validation-cycle feedback into
.agentslearnings so tracked patterns match the shipped tree
- Lookup findings — Fixed
ao lookupand inject scoring so findings render, cite, and score correctly after the branch merge - 23 CLI bug fixes — Fixed goroutine leaks, race conditions, panics, buffer overflows, missing error handling, and nil-check inconsistencies
- Post-mortem evidence hardening — Staged changes and worktree evidence are now captured durably so proof isn't lost during compaction or cleanup
- Codex-first skill rollout across the full catalog with override coverage, generated-artifact governance, and install/runtime parity validation
- Claim-aware next-work lifecycle handling with contract parity checks for
/rpiand follow-on flows - Headless runtime skill smoke coverage and Codex backbone prompt validation in the release gate stack
- Codex maintenance guidance, override coverage docs, and CLI-to-skills mapping to match the generated runtime model
- Release-prep validation flows for runtime smoke, Codex artifact sync, and release note generation
- Next-work queue mutation races by making claim/update handling concurrency-safe and per-item
- Codex prompt parity drift by syncing generated prompts and tightening override coverage gates
- Worktree Git resolution and vibe-check runtime environment handling
- Push/pre-push validation regressions and nested pre-push wrappers
- Streamed phase timeout cancellation so phased runtime tests and release gating terminate promptly
- Codex install workflow now uses
~/.agents/skillsas the single raw skill home and stops recreating an AgentOps mirror in~/.codex/skills - Native Codex plugin refresh now archives overlapping legacy
~/.codex/skillsAgentOps folders instead of repopulating them - Codex install docs now consistently describe the
~/.agents/skillsworkflow and the need for a fresh Codex session after install - Codex skill conversion now preserves multiline YAML
descriptionfields correctly, fixing malformed generated metadata for skills such as Compile ao doctornow treats plugin-cache plus~/.agents/skillsas the supported Codex layout and reports manifest drift with accurate wording
- Flywheel loop closure —
ao session close --auto-extractproduces lightweight learnings and auto-handoff at session boundary - Handoff-to-learnings bridge —
ao handoffnow extracts decisions into.agents/learnings/automatically - Session-type scoring in
ao inject --session-type— 30% boost for matching session context (career, debug, research, brainstorm) - Identity artifact support —
ao inject --profilesurfaces.agents/profile.mdin session context - MEMORY.md auto-promotion in
ao flywheel close-loop(Step 7) after maturity transitions - Session-type detection in
ao forgeoutput metadata - Production RPI orchestration engine —
ao rpi serve <goal>with SSE streaming and auto mode - Knowledge mining —
ao mineandao defragcommands for automated codebase intelligence - Context declarations —
ao inject --for <skill>reads skill frontmattercontext:block for scoped retrieval - Sections include allowlist and context artifact directories for skill-scoped injection
ao handoffcommand for structured session boundary isolation- Behavioral guardrails — 3-layer hook defense-in-depth (intent-echo, research-loop-detector, task-validation-gate)
- Context enforcement hook and run-id namespaced artifact paths
- Headless invocation standards and RPI phase runner
- Nightly CI compile job for automated knowledge warmup
- Coverage ratchet gate with BATS integration tests for shell scripts
- Fuzz targets, property tests, and golden file contracts for CLI
- Git worker guard, embedded parity gate, and swarm evidence validation hooks
- Release cadence gate warns on releases within 7 days of previous
- Coverage floor raised to 84% for
cmd/ao, average floor to 95% - Complexity ceiling tightened to 20 (from 25)
- Default session-start hook mode switched from manual to lean
- Hard quality gate on injection — maturity + utility filter
- Post-mortem redesigned as knowledge lifecycle processor
- RPI god-file split — 1,363 lines reduced to 203 with structured handoff schema
- Legacy RPI orchestrator retired — serve now uses phased engine (-1,121 lines)
- Council V2 findings synthesized into agent instructions and skill contracts
- 10k LOC of coverage-padding tests deleted; 72 stale tests quarantined
- Skill hardening — web security controls across 5 skills, CSRF protection, crank pre-flight
- Session-end hook wires
ao session close --auto-extractbefore existing forge pipeline
- Flywheel signal chain — confidence decay, close-loop ordering, glob errors
- Path traversal in context enforcement hook and frontmatter parsing
- Race condition in handoff consumption at session boundary
ao minestabilized — dedup IDs, error propagation,--sincewindow, empty output guard- Hook test assertions aligned with warn-then-fail ratchet pattern (strict env required)
- Pre-mortem gate exit code corrected to 2 in strict mode (was 1)
- RPI serve event pipeline and coherence gate hardened
- jq injection via bare 8-hex run IDs in serve classifier
- Goals parser edge cases — paired backtick strip and rune-aware truncation
- UTF-8 truncation across six functions converted to rune-safe slicing
- CORS headers and stale doc references cleaned up
- Cross-wave worktree file collisions prevented
- hookEventName added to hookSpecificOutput JSON schema
- README highlights
ao search(built on CASS) — indexes all chat sessions from every runtime unconditionally; adds Second Brain + Obsidian vault section with Smart Connections local/GPU embeddings and MCP semantic retrieval
- CHANGELOG retrospectively updated to document all v2.19.1 post-tag commits (skills namespace fixes were shipped but not recorded)
- Quickstart skill rewritten from 275 lines to 68 lines — removes 90-line ASCII diagram and 50-line intent router that caused 3+ minute runtime; now outputs ~8 lines and completes in under 30 seconds
truncateTextedge case: maxLen 1–3 now returns"..."[:maxLen]instead of the original string unchanged- Dead anti-pattern promotion functions removed from
ao maturity(promoteAntiPatternsCmd,filterTransitionsByNewMaturity,displayAntiPatternCandidates, ~99 LOC) - Windows file-lock and signal support — replace no-op
filelock_windows.gowith realLockFileEx/UnlockFileExvia kernel32.dll; extractsyscall.Flockandsyscall.Killinto platform-specific helpers so the binary compiles on Windows without POSIX-only syscalls heal.shCheck 7 false positive — script reference integrity check now strips URLs before pattern matching, preventing remotehttps://…/scripts/foo.shreferences from being validated as local files- Security gate
BLOCKED_HIGH— three persistent findings resolved: gosec G118 false positive (context cancel func returned to caller), golangci-lint nolint syntax (space in// nolint:directive), radon double-countingreverse_engineer_rpi.pyfromskills-codex/copy - 71 stale
ao know *andao quality *namespace references replaced across 17skills-codex/SKILL.md files — agents running rpi/evolve/crank were invoking non-existent commands from the pre-flatten CLI namespace - Three HIGH-severity stale command references fixed across
skills/andskills-codex/:ao flywheel status→ao metrics flywheel status,ao settings notebook update→ao notebook update,ao start seed/init→ao seed/ao init
- Spec-consistency gate (
scripts/spec-consistency-gate.sh) validates contract files before crank spawns workers - Command-surface parity gate (
scripts/check-cmdao-surface-parity.sh) ensures all CLI leaf commands are tested scripts/post-merge-check.shnow validatesgo mod tidysync and blocks on symlinksscripts/merge-worktrees.shnow propagates file deletions and preserves permissions- Post-mortem preflight script checks reference file existence before council runs
- Hooks.json preflight validates script existence
- Windows binaries added to GoReleaser and SLSA attestation subject list
- Coverage floor raised 78% → 80% with CI enforcement gate; Codecov threshold aligned to 75%
- Six truncation functions converted to rune-safe Unicode slicing
truncateIDin pool.go delegates to sharedtruncateText- Crank skill invokes spec-consistency gate before spawning workers
- Vibe skill carries forward unconsumed high-severity next-work items as pre-flight context
- Release skill warns on unconsumed high-severity next-work items
- next-work JSONL schema formalized to v1.2
- Skills installation switched from
npx skillsto native curl installer (bash <(curl -fsSL …/install.sh)) - README updated with 5-command summary, compound effect section, and
/vibebreakdown
ao mindcommand for knowledge graph operations.- New RPI operator surfaces: normalized C2/event plumbing plus
ao rpi stream,ao rpi workers, and tmux worker nudge visibility. - Codex install/bootstrap improvements, including native
~/.codex/skillsinstall and one-line installer flow. - Windows binaries added to GoReleaser build outputs.
- CLI namespace migration completed and aligned across hooks, docs, integration tests, and generated command references.
- Codex skill system moved to regenerated modular layout with codex-specific overrides and runtime prompt tailoring.
- CI/release gates hardened (codex runtime sections, release e2e validation, parity checks, stricter policy enforcement).
- High-complexity CLI paths refactored (
runRPIParallel,runDedup,parseGatesTable) to lower cyclomatic complexity.
- Multiple post-mortem remediation waves landed for CLI/RPI/swarm reliability and edge-case handling.
- Hook delegation and integration behavior corrected for flat command namespace.
heal.shfalse-positive behavior reduced and doctor stale-path detection improved.- Skill/doc parity and cross-reference drift issues corrected across codex and core skill catalogs.
- Legacy inbox/mail command surface and stale/dead skill references from active catalogs.
ao seednow creates.gitignoreand storage directories — reusessetupGitProtection,ensureNestedAgentsGitignore, andinitStoragefromao initao seedtext updated from staleao inject/ao forgeto current MEMORY.md + session hooks paradigm- MemRL feedback loop closed —
ao feedback-loopcommand wired,ao maturity --recalibratedry-run guard added - Quickstart skill updated to reference
ao seedand current flywheel docs - CLI reference regenerated after
ao feedback-loopand seed help text changes
.agents/session artifacts removed from git tracking- PRODUCT.md updated — Olympus section removed, value props and skill tier counts corrected
- GOALS.md coverage directive updated to measured 78.8% (target 85%)
- SessionStart hook default mode changed from
manualtolean— flywheel injection now fires every session - Auto-prune enabled by default (
AGENTOPS_AUTO_PRUNEdefaults to1, opt-out via=0) - Anti-pattern detection threshold lowered from
harmful_count >= 5to>= 3 - Eviction confidence threshold relaxed from
< 0.2to< 0.3 - Maturity promotion threshold in
--helptext synced with code (0.7→0.55)
- Empty learnings no longer inflate flywheel metrics — extract prompt skips empty files, pool ingest rejects "no significant learnings" stubs
ao pool ingestnow runs automatically in session-end hook after forge (was manual-only)- 8 stale doc/comment references to old thresholds updated across hooks, ENV-VARS.md, HOOKS.md, using-agentops skill
- 13 empty stub learnings removed from
.agents/learnings/
ao notebook updatecommand — compound MEMORY.md loop that merges latest session insights into structured sectionsao memory synccommand — sync session history to repo-root MEMORY.md with managed block markers for cross-runtime access (Codex, OpenCode)ao seedcommand — plant AgentOps in any repository with auto-detected templates (go-cli, python-lib, web-app, rust-cli, generic)ao lookupcommand — retrieve specific knowledge artifacts by ID or relevance query (two-phase complement toao inject --index-only)ao constraintcommand family — manage compiled constraints (list, activate, retire, review)ao curatecommand family — curation pipeline operations (catalog, verify, status)ao dedupcommand — detect near-duplicate learnings with optional--mergeauto-resolutionao contradictcommand — detect potentially contradictory learningsao metrics healthsubcommand — flywheel health metrics (sigma, rho, delta, escape velocity)ao context assemblecommand — build 5-section context packet briefings for tasks- Work-scoped knowledge injection:
ao inject --bead <id>boosts learnings tagged with the active bead - Predecessor context injection:
ao inject --predecessor <handoff-path>surfaces structured handoff context - Compact knowledge index:
ao inject --index-onlyoutputs ~200 token index table for JIT retrieval - Learning schema extended with
source_beadandsource_phasefields for work-context tracking ao extract --bead <id>tags extracted learnings with the active bead ID- Citation-to-utility feedback pipeline in flywheel close-loop (stage 5)
- Global
~/.agents/knowledge tier for cross-repo learning sharing (0.8 weight penalty, deduped) - Bead metadata resolver reads from env vars (
HOOK_BEAD_TITLE,HOOK_BEAD_LABELS) or cache file - Goal templates embedded in binary (go-cli, python-lib, web-app, rust-cli, generic) for
ao goals init --templateandao seed - Platform-specific process-group isolation for goal check timeouts (Unix: SIGKILL pgid, Windows: taskkill /T)
- SessionStart hook rewritten with 3 startup modes: lean (default), manual, legacy — via
AGENTOPS_STARTUP_CONTEXT_MODE - SessionEnd hook now gates notebook update and memory sync on successful forge
- Type 3 setup hook template:
hooks/examples/50-agentops-bootstrap.sh - Constraint compiler hook:
hooks/constraint-compiler.sh - Codex-native skill format (
skills-codex/) with install and sync scripts for cross-runtime skill delivery - Comprehensive cmd/ao test coverage push — 500+ tests across 5 waves reaching 79.2% statement coverage (13 untestable functions excluded)
- SessionStart hook default mode changed from full inject to
lean(extract + lean inject, shrinks when MEMORY.md is fresh) ao flywheel close-loopnow applies ALL maturity transitions (not just anti-pattern)ao hooksgenerated config uses script-based commands instead of inline ao invocationsao rpiprefers epic-type issues before falling back to any open issue
truncateTextnow uses rune-safe[]runeslicing to avoid breaking multi-byte UTF-8 characterssyncMemoryextracted from Cobra handler for testabilityparseManagedBlockdetects duplicate markers and refuses to parse (prevents data loss)readNLatestSessionEntrieswarns on skipped unreadable session filesreadSessionByIDdetects ambiguous matches and returns error instead of first substring matchfindMemoryFilebroad contains-fallback removed (was matching wrong projects)pruneNotebookiteration capped at 100 to prevent runaway loopsMEMORY_AGE_DAYSsentinel initialized to -1 (was 0, causing false lean-mode activation when file missing)- Lean-mode guard now requires
MEMORY_AGE_DAYS >= 0before comparing freshness - Memory sync moved inside forge success gate in session-end hook
ao search --jsonreturns[](empty JSON array) when no results, instead of human-readable textao doctorreturnsDEGRADEDstatus for warnings without failures (previously only HEALTHY/UNHEALTHY)ao rpi statusgoroutine leak fix — signal channel properly cleaned up- Inline rune truncation in
formatMemoryEntryreplaced with sharedtruncateText - 6 new tests for dedup, ambiguity detection, iteration cap, duplicate markers
- Cobra pflag state pollution between test invocations — explicit flag reset in
executeCommand()helper - Goals validate.sh outdated checks and missing validate.sh for 7 skills
- 10 tech debt findings from ag-8km+ag-chm post-mortem (stale nudge, scanner, docs)
- ao binary codesigned with stable Mach-O identifier
- Hook integration tests updated — removed 8 stale standalone ao-* hook tests consolidated into session-end-maintenance.sh
- GOALS.md (v4) OODA-driven intent layer — markdown-based goals format with mission, north/anti stars, and steerable directives
ao goals initinteractive GOALS.md bootstrap with--non-interactivemodeao goals steercommand to add, remove, and prioritize directivesao goals prunecommand to remove stale gates referencing missing pathsao goals migrate --to-mdconverter from GOALS.yaml to GOALS.md formatao goals measure --directivesJSON output of active directivesao goals validatereports format and directive count- Format-aware
ao goals addwriteback (auto-detects md or yaml) - Go markdown parser library with case-insensitive heading matching and round-trip rendering (26 tests)
/goalsskill rewritten with 5 OODA verbs (init/measure/steer/validate/prune)/evolveStep 3 rewritten with directive-based cascade for idle reduction
ao rpifalls back to any open issue when no epic exists (#50)- RPI phased processing tests added (~230 lines) for writePhaseResult, validatePriorPhaseResult, heartbeat, and registry directory
- Evolve idle hardening — disk-derived stagnation detection, 60-minute circuit breaker, rolling fitness files, no idle commits
- Evolve
--qualitymode — findings-first priority cascade that prioritizes post-mortem findings over goals - Evolve cycle-history.jsonl canonical schema standardization and artifact-only commit gating
heal-skillchecks 7-10 with--strictCI gate for automated skill maintenance- 6-phase E2E validation test suite for RPI lifecycle (gate retries, complexity scaling, phase summaries, promise tags)
- Fixture-based CLI regression and parity tests
ao goals migratecommand for v1→v2 GOALS.yaml migration with deprecation warning (#48)- Goal failure taxonomy script and tests
- CLI taxonomy, shared resolver, skill versioning, and doctor dogfooding improvements (6 architecture concerns)
- GoReleaser action bumped from v6 to v7
- Evolve build detection generalized from hardcoded Go gate to multi-language detection
ao pool list --wideflag andpool showprefix matching (#47)- Consistent artifact counts across
doctor,badge, andmetrics(#46) - Double multiplication in
vibe-checkscore display (#45) - Skills installed as symlinks now detected and checked in both directories (#44)
- Learnings resolved by frontmatter ID;
.mdfile count in maturity scan (#43) - JSON output truncated at clean object boundaries (#42)
- Misleading hook event count removed from display (#41)
- Post-mortem schema
modelfield and resolverDiscoverAllmigration - 15+ missing skills added to catalog tables in
using-agentops - Handoff example filename format corrected to
YYYYMMDDTHHMMSSZspec - Quickstart step numbering corrected (7 before 8)
- OpenAI docs skill: added Claude Code MCP alternative to Codex-only fallback
- Dead link to
conflict-resolution-algorithm.mdremoved from post-mortem ao forge search→ao searchin provenance and knowledge skills- OSS docs: root-level doc path checks, removed golden-init reference
- Reverse-engineer-rpi fixture paths and contract refs corrected
- Crank: removed missing script refs, moved orphans to references
- Codex-team: removed vaporware Team Runner Backend section
- Security skill: bundled
security-gate.sh, fixedsecurity-suitepath - Evolve oscillation detection and TodoWrite→Task tools migration
- Wired
check-contract-compatibility.shinto GOALS.yaml - Synced embedded skills and regenerated CLI docs