All notable changes to this project are documented here. Format based on Keep a Changelog; this project adheres to Semantic Versioning.
- Independent out-of-band re-execution watcher (
prd_taskmaster/tournament/watcher.py;watcher-run/watcher-statusCLI) — the precondition for ever enabling real (--enforce-slash) forfeiture. It re-adjudicates settled tournament submissions from primary evidence (the claimed commit + the CDD card) without trusting the recorded verdict: it re-runs the oracle gate, re-derivessha256(diff base..HEAD)to catch diff-copy tamper independently of the live collector, and accumulates a concordance ledger over real slash decisions. Apermit_enforce_slashgate is fail-closed: real slashing is permitted only when every to-be-slashed submission is independently confirmed, there is no discrepancy or abstain anywhere in the job, and the watcher's historical concordance clears a threshold over a minimum number of prior decisions. Inability to verify (oracle could not run, no worktree, failed hash recompute) abstains — it is never counted as grounds to slash. - Engine-enforced shadow-until-permitted —
run_tournamentnow consults the watcher before any real--enforce-slashsettle and downgrades to shadow unless the watcher permits the job (fail-closed on any watcher error). Real AtlasCoin is never burned without an independent positive confirmation; the shadow-slash default path is unchanged.
The "unfakable done" release: a task ships only when its test genuinely re-ran green AND the code it tested is wired in — and that trustworthy signal now drives a competitive marketplace. Consolidates the oracle + reachability gates, the settled tournament, and the dogfooded first-run UX into one engine.
- Re-execution oracle (ship-check Gate 5) — re-grades each "done" task by re-running the
operator-held tests at the claimed commit in a network-isolated, digest-pinned podman
sandbox. A submitter can no longer pass by editing their own logs; the self-grantable
SHIP_CHECK_OVERRIDE_ADMINbypass is removed. - Reachability gate (Gate 6) —
done = oracle-PASS AND the tested code is wired in. A green test on an orphan module (imported by nothing) is blocked and surfaced as⚠ scaffolded. - Settled-tournament marketplace (
tournament run/tournament-status) — N executors race one job, every submission is adjudicated through both gates, the winner is paid in AtlasCoin, honest losers are refunded, and a trusted reputation store (UCB explore/exploit) routes the next job to the cheapest proven-capable executor. Includes a cheap OpenRouter (goose) racer. - Deterministic
expand-structural— decomposes an under-specified task into ≥2 verifiable subtasks with no model or network.
/atlasnow opens with a Confirm-Intent / plan-mode gate before any file is written, and gate prompts are harness-adaptive (AskUserQuestion on Claude Code; nearest equivalent on codex/gemini).task-master-aiis now optional (the native engine stands alone); thetm-parallel/tm-plan/tm-run/tm-harvestsurfaces were removed.- Fakery is shadow-logged only this release — proven cheating is recorded, no AtlasCoin is burned, and honest losers always get their stake back.
- Preflight
configureno longer silently no-ops on a fresh project (returns an explicit "configuration deferred"). tournament settlealways returns a parseable{ok, stage}envelope (no empty stdout on a bad job dir); settlement is crash-resumable (no paid-but-stuck winner).- The cheap (goose) racer is no longer falsely rejected on non-ASCII diffs; tournament racers are wired to the real orchestrator inbox.
- AtlasCoin is conserved (no mint/burn) under account aliasing.
- 872 passing + 4 real-podman e2e gates green (oracle dogfood + tournament settle/pay/reputation).
Front-door UX flow fixes (a UX-flow audit found the journey still broke before a
new user reached the 5.2.1 backend fixes). See docs/audit/UX-FLOW-AUDIT.md. Also
syncs the version source-of-truth (prd_taskmaster/__init__.py), which 5.2.1 missed.
- UX-P0-1 — the README's first command now resolves.
READMEled with/atlas, which a fresh/plugin install prddoes not provide (plugin commands are namespaced/prd:*). Added a brand-nameatlasentrypoint skill (→/prd:atlas, a thin alias that dispatches to thegoorchestrator) and updated the README first-run to/prd:atlas(or/prd:go, or natural language). - UX-P0-2 — phase gates no longer document a self-contradicting STOP.
setup/discover/generate/handoffopened with "if the gate fails, stop" immediately followed by "it WILL fail on first entry, proceed past it (see morning brief)" — a compliant autonomous agent would halt. Rewritten to explaincheck_gateis an EXIT gate (evidence to advance, not to enter), so a first-entryfalseis expected; the gate is enforced on advance. Removed leaked internal references ("morning brief", "Mum dogfood feedback"). - UX-P0-3 —
token_economyset via/customise-workflowis now honored. It writes.atlas-ai/config/atlas.json, but the engine read economy only from.atlas-ai/fleet.json.load_fleet_confignow readstoken_economyfromatlas.jsonwhenfleet.jsondoesn't set one (fleet.json wins if it does); the config schema doc adds the key.
prd_taskmaster/__init__.pyversion bumped (5.2.1 setpackage.json/plugin.jsonbut missed the__init__.pysource-of-truth the manifest tests check).
Pre-relaunch hardening — fixes the first-run failures a multi-agent audit found
on the documented zero-key path (and reproduced firsthand). See docs/audit/.
- P0-1 —
configure-providersnow REPAIRS keyless stock defaults. It previously only filled empty model roles, sotask-master init's paidanthropic+perplexitydefaults survived untouched and a keyless first run produced 0 tasks. It now migrates any stock default whose provider is unusable in the current environment to the availableclaude/codexCLI or the free local proxy (KNOWN_STOCK_TASKMASTER_DEFAULTS- a
_provider_usablecheck). Genuine user configs and usable configs are preserved; the provider decision now dominates the tier decision.
- a
- P0-2 — the SETUP gate (
validate_setup) is credential-aware. It reportedready=Truewhenever a model-id string was present — green-lighting the exact 0-task config. It now verifies the configured provider is actually reachable (key present / CLI on PATH). - P0-3 —
expanddegrades to a structural pass when the research provider is down. Both the parallel and serial paths hardcoded--research; a quota/auth outage hard-failed tasks to 0 subtasks. They now retry without--research(structural expand is always available) and mark the resultdegraded. - P1-1 —
parse_prdreportsok=Falseon a 0-task parse (was a silent success). - Nested-session spawn probe (gh #11/#12). When
mainis a CLI-spawning provider inside a nested Claude Code session, the gate now probes whether the spawn works rather than assuming — keeping the free path when it works and surfacing an actionable error (never bare--claude-code) when it genuinely fails. - Stale-tag detection (gh #13).
preflightnow reportscurrent_tag_stale+suggested_fresh_tagso a new PRD is not parsed into a polluted, fully-done tag.
package.jsonnow carriesauthor,homepage, andbugsso the npm page is not anonymous.
Progress visualization — the README/UX-SPEC panels are now real output.
statuscommand +render_statusMCP tool that draw the boxed Atlas progress panels from real pipeline state: phase tracker, the GENERATE validation scorecard (grade bar + checks + line-located warnings + placeholders + task/subtask counts), the preflight capability panel, the handoff panel, the execute progress bar, and the ship-check gates panel.--format boxed|ascii|json,--all,--phase.prd_taskmaster/render.py(pure renderer, UX-SPEC §7 symbol grammar with ASCII fallbacks viaATLAS_ASCII=1; display-width-aware so borders align even with the 🔒 emoji) andprd_taskmaster/status.py(pure reader of pipeline/tasks/validation state).validate-prdnow persists its result to.atlas-ai/state/validation.jsonso the scorecard renders without recomputation.- Phase skills (setup, generate, handoff, execute-task) render the matching panel at each phase boundary.
- The grade bar floors rather than rounds (49/57 = 86% → 8/10 filled), matching the README mockup and never inflating a grade.
- UX-SPEC pricing mockups marked superseded by the private-pilot decision (no
$29).
Honesty + positioning + discoverability.
- Pre-alpha status is now stated up front (badge + Project status section): the newer systems (fleet, backend abstraction, token economy, Pro MCPs) are not yet battle-tested; expect breaking changes.
- PRD validation warnings (vague language + placeholders) now carry line numbers — the "quoted + located, not just counted" claim is now executable truth, not just rendering.
package.jsonkeywordsfor npm search discoverability (factual token-economy / cost-ledger / model-routing mechanism terms — no unproven savings claims).
- Atlas Pro reframed as a private pilot. Not generally available, not for sale; pricing
removed. Access is granted at discretion during the pilot; "Get Atlas Pro" CTAs become "Request
pilot access" pointing at GitHub Discussions (an
atlas-ai.au/pilotsignup page is in progress).
Audit-driven honesty release (dogfood cycle 6 — the engine ran its own pipeline on this work).
- Placeholder hard fail. Any placeholder —
{{...}}, bracketed, or bare case-sensitiveTBD/TODO— now floors the PRD grade to NEEDS_WORK, setshard_failin the result, and makesvalidate-prdexit non-zero. The README's rigor claim is now executable truth. - Installer pins its clone to its own release tag (
--branch v$VERSION, branch fallback).
atlas-ai.au/installandatlas-ai.au/pronow resolve (Cloudflare single-redirects) — the recommended install path 404'd at 5.1.0 launch.- README discloses the audit-logged ship-check admin override, the npm
postinstallpip step, and marks the local research proxy bring-your-own. - Marketplace manifest description no longer reads as an internal testing artifact.
First npm publish (prd-taskmaster@5.1.0). This release consolidates the two development
lines into one artifact and re-launches the repo as an open-core product: a free MIT
engine plus a commercial Atlas Pro tier.
Lineage: the public v4.x line (this repo) absorbed the private v5.x plugin line
(prd-taskmaster-plugin, internal name "atlas-go", final state v5-final @ f140490) via
file-level imports marked with Imported-From: commit trailers — state machine, ship-check
gates, execute-task hardening, npm pack hygiene, and the granular validation test suite.
Versioning continues from the higher v5.x line so the consolidated artifact supersedes both.
- Plugin namespace renamed
prd-taskmaster→prd— commands are now/prd:goetc.; MCP tool ids aremcp__plugin_prd_go__*(old prefixes kept as legacy allowed-tools aliases). - Per-directory
.npmignorefiles keep Python bytecode out of the npm tarball (npm 11 ignores the root.npmignoreinsidefiles[]-allowlisted directories). - 21 granular PRD-validation tests ported from the plugin suite (
tests/core/test_validation.py). - Setup/execute-task skills resolve the customizations starter pack and
ship-check.pyfrom the packaged${CLAUDE_PLUGIN_ROOT}/skel/(previously referenced a developer-machine path).
The notes below document the v4.0.0 line that this release ships for the first time.
- Token economy (
token_economy: conservative|balanced|performance) — per-op-class start tiers, validator-gated escalation with per-mode ceilings, and economy-aware provider configuration. Verified priors and sources indocs/product/MODEL-ECONOMY.md. - Parallel native TaskMaster expansion (
tm-parallel/tm_parallel_expand) — TaskMaster's model-agnosticexpand --researchruns concurrently in isolated workdirs (per-task economy-tier models), merged atomically; the Claude-subagent path becomes the documented fallback. - Local cost telemetry (
.atlas-ai/telemetry.jsonl) +economy-reportsummarizer. - Unified deterministic core (
prd_taskmaster/) — a single stdlib-only Python package that is the one source of truth for PRD validation, task calculation, complexity enrichment, capability detection, the pipeline state machine, and the ship-check gate. Imported by both the zero-dependency skill (script.py) and the FastMCP plugin (mcp-server/server.py). - 5-phase gated pipeline —
SETUP → DISCOVER → GENERATE → HANDOFF → EXECUTEwith atomic compare-and-swap transitions over a flock-guardedpipeline.json. - CDD execute loop —
skills/execute-taskruns a 13-step contract-driven cycle with evidence cards; completion is gated by a deterministicSHIP_CHECK_OKtoken (a non-zero exit code in any evidence file blocks it). - Parallel research fan-out (
prd_taskmaster/parallel.py) —plan/applyresearch packets let an agent expand tasks across parallel subagents and merge results atomically. /atlascommand — the primary user-facing invocation (alias of the orchestrator skill).- Product + UX specs (
docs/product/) — the living contract the dogfood ship-check verifies against. - Atlas Pro teaser — the handoff surfaces Atlas Fleet (parallel multi-session execution) as a locked, clearly-priced upgrade. The free engine stays fully functional standalone.
- Repo keeps the name prd-taskmaster (and its 508★). The product/command brand is Atlas; the internal "atlas-go" name from the plugin line is retired.
- PRD validation is stricter than older internal versions (grade thresholds + placeholder attribution). PRDs that previously passed loosely may now score lower — by design.
- Two install paths from one repo: curl one-liner (zero-dependency skill) and Claude Code plugin /
npm install(full FastMCP plugin).
- Ship-check tests aligned to the live
skel/ship-check.pygate contract (3 stale-schema failures resolved).
v4.0.0 merges the prd-taskmaster-v2 skill line (internally versioned 2.x, never published) and the atlas-go plugin line (internally 5.x, never published) back into this repository. Those internal version numbers do not appear as tags here; the public lineage runs v3.0.0 → v4.0.0.
- Codified deterministic operations into
script.py; curl installer with update notifications; template-based PRD and CLAUDE.md generation; community files. (Pre-merge single-skill product.)