You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add a real AI review gate to /afk between the implementing sandbox and auto-merge. Today /afk auto-merges on green sandcastle CI with no code review (branch protection requires no PR review), so off-spec-but-CI-green code lands unreviewed (ADR-0007). Full design + rationale: .scratch/afk-review-gate/prd.md.
Re the reference: mattpocock's run.ts has a Reviewer step, but it is not a gate — review-prompt.md is an autonomous refactor/polish pass that commits cleanups (RALPH: Review -) to the branch and emits <promise>COMPLETE</promise>; run.ts never inspects its result, so the branch merges regardless. A pass/fail merge gate like the one below exists in neither repo; this issue builds it (stronger than the reference). v1 is a read-only verdict gate (below); reference-style cleanup-commit behavior is explicitly out of scope.
Blocked by #72 (lean execution layer) — overlaps reduce.ts/sandbox-runner.ts; build the reviewer adapter on #72's refactored high-level lifecycle. Relabel/work once #72 merges.
Goal (end-to-end value)
In afk mode, after the implementing sandbox opens a PR, a reviewer sandbox reviews the branch diff against the issue's acceptance criteria + repo conventions/ADRs and emits a structured verdict. Pass → enable auto-merge (merges on green CI, as today). Changes-requested → park for a human (no auto-merge), with the review posted to the PR. hitl is unchanged (the human already reviews, ADR-0009).
Acceptance criteria
Reducer gate (the real logic — see Testing): in reduce.ts, afk onSandboxFinished returns a new StartReview { pr } action instead of EnableAutoMerge. A new ReviewFinished { issue, pr, verdict } event maps pass → EnableAutoMerge, changes-requested → WaitForHuman. hitlonSandboxFinished still returns WaitForHuman.
Reviewer adapter: runs a sandcastle agent on the PR branch reading the diff + acceptance criteria + ADRs, and produces a schema-validated structured verdict via sandcastle.Output.object({ tag, schema }) + a runWithExtraction-style produce→extract pass (a working prompt.md plus a separate extraction.md whose only job is to emit the <output> block) — NOT a hand-rolled <tag>{…}</tag> regex. Schema at least { verdict: "pass" | "changes-requested", summary, comments[] }. Reviewer runs in noSandbox() (it only reads the branch + calls gh). Prior art: the reference's .sandcastle/agent-workflows/review/ (review.ts, prompt.md, extraction.md, shared/run-with-extraction.ts, shared/review-output.ts) — see [[sandcastle-reference-patterns]]. All of Output.object, resumeSession, and noSandbox ship in @ai-hero/sandcastle@0.10.0.
Validate then filter: structured output is never trusted raw — drop inline comments whose path/line aren't in the diff hunks (mirror the reference's filterInlineComments) before posting the PR review.
Read-only w.r.t. code: the reviewer reviews/comments only — it must not push to the branch.
Fail-safe: reviewer timeout / crash / missing-or-garbled verdict resolves to changes-requested → WaitForHuman, never to auto-merge.
afk-only:/hitl path untouched.
ADR: new ADR ("Review gate before auto-merge in /afk") amending ADR-0007 (CI is no longer the only afk gate) and relating to ADR-0009.
Testing (highest, single seam — logic in TypeScript)
Reuse the reducer seam — reduce.test.ts (node:test + node:assert/strict), where afk/hitl branching is already tested:
afk + SandboxFinished → StartReview (not EnableAutoMerge); ReviewFinished{pass} → EnableAutoMerge; ReviewFinished{changes-requested} → WaitForHuman; hitl + SandboxFinished → WaitForHuman (unchanged); onTick does not Stop while a PR is awaiting/under review.
The pure seam for the verdict is a small reviewVerdict(structuredOutput | extractionError) → "pass" | "changes-requested" mapper (schema validation itself is Output.object's job — don't re-test the library); unit-test that a valid pass-output → pass, a changes-requested output → changes-requested, and any extraction/validation error → changes-requested (fail-safe), mirroring parseSmeeEvent / parseBlockedBy.
Live reviewer-sandbox execution stays behind the gated integration.test.ts (SANDCASTLE_INTEGRATION=1); the default npm test and the sandcastle CI job are unaffected. Do not add a new seam.
Unchanged (do not touch)
/hitl (human is the reviewer). Downstream merge mechanics (EnableAutoMerge → GitHub auto-merge on green CI) are unchanged — this only inserts a gate before them. No branch-protection changes. Not integrating the billed, user-triggered /code-review ultra into the autonomous loop (it can't run hands-off) — note it only as a human-driven alternative.
Constraints (per repo workflow)
Implement with /tdd per criterion, run shell via /exec, stay scoped to the reducer + reviewer adapter + prompt template + ADR, no pushing to main, no new dependencies.
Add a real AI review gate to
/afkbetween the implementing sandbox and auto-merge. Today/afkauto-merges on greensandcastleCI with no code review (branch protection requires no PR review), so off-spec-but-CI-green code lands unreviewed (ADR-0007). Full design + rationale:.scratch/afk-review-gate/prd.md.Blocked by #72 (lean execution layer) — overlaps
reduce.ts/sandbox-runner.ts; build the reviewer adapter on #72's refactored high-level lifecycle. Relabel/work once #72 merges.Goal (end-to-end value)
In
afkmode, after the implementing sandbox opens a PR, a reviewer sandbox reviews the branch diff against the issue's acceptance criteria + repo conventions/ADRs and emits a structured verdict. Pass → enable auto-merge (merges on green CI, as today). Changes-requested → park for a human (no auto-merge), with the review posted to the PR.hitlis unchanged (the human already reviews, ADR-0009).Acceptance criteria
reduce.ts, afkonSandboxFinishedreturns a newStartReview { pr }action instead ofEnableAutoMerge. A newReviewFinished { issue, pr, verdict }event maps pass →EnableAutoMerge, changes-requested →WaitForHuman.hitlonSandboxFinishedstill returnsWaitForHuman.onTick's pending-work check counts a PR awaiting/under review as pending, so the loop doesn'tStopbefore the verdict; a conflicting PR is still excluded (Orchestrator reuses stale agent/issue-N branches, causing merge conflicts #23).sandcastle.Output.object({ tag, schema })+ arunWithExtraction-style produce→extract pass (a workingprompt.mdplus a separateextraction.mdwhose only job is to emit the<output>block) — NOT a hand-rolled<tag>{…}</tag>regex. Schema at least{ verdict: "pass" | "changes-requested", summary, comments[] }. Reviewer runs innoSandbox()(it only reads the branch + callsgh). Prior art: the reference's.sandcastle/agent-workflows/review/(review.ts,prompt.md,extraction.md,shared/run-with-extraction.ts,shared/review-output.ts) — see [[sandcastle-reference-patterns]]. All ofOutput.object,resumeSession, andnoSandboxship in@ai-hero/sandcastle@0.10.0.filterInlineComments) before posting the PR review.WaitForHuman, never to auto-merge./hitlpath untouched.Testing (highest, single seam — logic in TypeScript)
Reuse the reducer seam —
reduce.test.ts(node:test+node:assert/strict), where afk/hitl branching is already tested:SandboxFinished→StartReview(notEnableAutoMerge);ReviewFinished{pass}→EnableAutoMerge;ReviewFinished{changes-requested}→WaitForHuman; hitl +SandboxFinished→WaitForHuman(unchanged);onTickdoes notStopwhile a PR is awaiting/under review.reviewVerdict(structuredOutput | extractionError) → "pass" | "changes-requested"mapper (schema validation itself isOutput.object's job — don't re-test the library); unit-test that a valid pass-output →pass, a changes-requested output →changes-requested, and any extraction/validation error →changes-requested(fail-safe), mirroringparseSmeeEvent/parseBlockedBy.Live reviewer-sandbox execution stays behind the gated
integration.test.ts(SANDCASTLE_INTEGRATION=1); the defaultnpm testand thesandcastleCI job are unaffected. Do not add a new seam.Unchanged (do not touch)
/hitl(human is the reviewer). Downstream merge mechanics (EnableAutoMerge→ GitHub auto-merge on green CI) are unchanged — this only inserts a gate before them. No branch-protection changes. Not integrating the billed, user-triggered/code-review ultrainto the autonomous loop (it can't run hands-off) — note it only as a human-driven alternative.Constraints (per repo workflow)
Implement with
/tddper criterion, run shell via/exec, stay scoped to the reducer + reviewer adapter + prompt template + ADR, no pushing to main, no new dependencies.