Skip to content

Review gate before auto-merge in /afk (AI Reviewer step) #74

Description

@lsfera

Add a real AI review gate to /afk between the implementing sandbox and auto-merge. Today /afk auto-merges on green sandcastle CI with no code review (branch protection requires no PR review), so off-spec-but-CI-green code lands unreviewed (ADR-0007). Full design + rationale: .scratch/afk-review-gate/prd.md.

Re the reference: mattpocock's run.ts has a Reviewer step, but it is not a gate — review-prompt.md is an autonomous refactor/polish pass that commits cleanups (RALPH: Review -) to the branch and emits <promise>COMPLETE</promise>; run.ts never inspects its result, so the branch merges regardless. A pass/fail merge gate like the one below exists in neither repo; this issue builds it (stronger than the reference). v1 is a read-only verdict gate (below); reference-style cleanup-commit behavior is explicitly out of scope.

Blocked by #72 (lean execution layer) — overlaps reduce.ts/sandbox-runner.ts; build the reviewer adapter on #72's refactored high-level lifecycle. Relabel/work once #72 merges.

Goal (end-to-end value)

In afk mode, after the implementing sandbox opens a PR, a reviewer sandbox reviews the branch diff against the issue's acceptance criteria + repo conventions/ADRs and emits a structured verdict. Pass → enable auto-merge (merges on green CI, as today). Changes-requested → park for a human (no auto-merge), with the review posted to the PR. hitl is unchanged (the human already reviews, ADR-0009).

Acceptance criteria

  • Reducer gate (the real logic — see Testing): in reduce.ts, afk onSandboxFinished returns a new StartReview { pr } action instead of EnableAutoMerge. A new ReviewFinished { issue, pr, verdict } event maps pass → EnableAutoMerge, changes-requested → WaitForHuman. hitl onSandboxFinished still returns WaitForHuman.
  • Loop stays alive during review: onTick's pending-work check counts a PR awaiting/under review as pending, so the loop doesn't Stop before the verdict; a conflicting PR is still excluded (Orchestrator reuses stale agent/issue-N branches, causing merge conflicts #23).
  • Reviewer adapter: runs a sandcastle agent on the PR branch reading the diff + acceptance criteria + ADRs, and produces a schema-validated structured verdict via sandcastle.Output.object({ tag, schema }) + a runWithExtraction-style produce→extract pass (a working prompt.md plus a separate extraction.md whose only job is to emit the <output> block) — NOT a hand-rolled <tag>{…}</tag> regex. Schema at least { verdict: "pass" | "changes-requested", summary, comments[] }. Reviewer runs in noSandbox() (it only reads the branch + calls gh). Prior art: the reference's .sandcastle/agent-workflows/review/ (review.ts, prompt.md, extraction.md, shared/run-with-extraction.ts, shared/review-output.ts) — see [[sandcastle-reference-patterns]]. All of Output.object, resumeSession, and noSandbox ship in @ai-hero/sandcastle@0.10.0.
  • Validate then filter: structured output is never trusted raw — drop inline comments whose path/line aren't in the diff hunks (mirror the reference's filterInlineComments) before posting the PR review.
  • Read-only w.r.t. code: the reviewer reviews/comments only — it must not push to the branch.
  • Fail-safe: reviewer timeout / crash / missing-or-garbled verdict resolves to changes-requested → WaitForHuman, never to auto-merge.
  • afk-only: /hitl path untouched.
  • ADR: new ADR ("Review gate before auto-merge in /afk") amending ADR-0007 (CI is no longer the only afk gate) and relating to ADR-0009.

Testing (highest, single seam — logic in TypeScript)

Reuse the reducer seam — reduce.test.ts (node:test + node:assert/strict), where afk/hitl branching is already tested:

  • afk + SandboxFinishedStartReview (not EnableAutoMerge); ReviewFinished{pass}EnableAutoMerge; ReviewFinished{changes-requested}WaitForHuman; hitl + SandboxFinishedWaitForHuman (unchanged); onTick does not Stop while a PR is awaiting/under review.
  • The pure seam for the verdict is a small reviewVerdict(structuredOutput | extractionError) → "pass" | "changes-requested" mapper (schema validation itself is Output.object's job — don't re-test the library); unit-test that a valid pass-output → pass, a changes-requested output → changes-requested, and any extraction/validation error → changes-requested (fail-safe), mirroring parseSmeeEvent / parseBlockedBy.
    Live reviewer-sandbox execution stays behind the gated integration.test.ts (SANDCASTLE_INTEGRATION=1); the default npm test and the sandcastle CI job are unaffected. Do not add a new seam.

Unchanged (do not touch)

/hitl (human is the reviewer). Downstream merge mechanics (EnableAutoMerge → GitHub auto-merge on green CI) are unchanged — this only inserts a gate before them. No branch-protection changes. Not integrating the billed, user-triggered /code-review ultra into the autonomous loop (it can't run hands-off) — note it only as a human-driven alternative.

Constraints (per repo workflow)

Implement with /tdd per criterion, run shell via /exec, stay scoped to the reducer + reviewer adapter + prompt template + ADR, no pushing to main, no new dependencies.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions