Build a Duplicate row action on the policies page, and pin four scenarios across the surface#1124
Build a Duplicate row action on the policies page, and pin four scenarios across the surface#1124rajshah4 wants to merge 5 commits into
Conversation
… scenarios The existing policies.test.ts pins create and list only. The existing policies-ui.test.ts authors rules from the integration's tool tree, not from the policies page itself. Three gaps fill in here: - policies-lifecycle: the typed API's update (full and partial payloads) and remove, plus an assertion that create returns a non-empty fractional- indexing position (the field the page's sort and reorder math both lean on). - policies-landing: a fresh workspace renders the explainer empty state with a gated add form. The disabled attribute is read as its value so a regression that ungates the submit prints the actual state. - policies-round-trip: the page itself is an authoring surface, add a rule, flip the action via the row badge, remove via the overflow menu, watch the empty state return. Server-side list reads zero rows for the suffix after, so the UI remove isn't just optimistic-cache fiction. Each scenario carries a per-run pattern suffix and an Effect.ensuring finalizer that removes any row carrying it via the typed API, so a mid-test failure on selfhost (shared bootstrap admin) can't leak state.
The row's overflow menu had Move up / Move down / Remove. Cloning a rule
to a sibling pattern meant retyping the pattern and re-picking the action
from a still-loaded form. The new Duplicate item copies the source row's
pattern, action, and owner into the add form, focuses the pattern input
with its content selected, and lets the user save the clone in one
keystroke.
Mechanics:
- AddPolicyForm gains an optional `prefill` prop (pattern + action + nonce).
The nonce is a monotonic counter, not the source id, so duplicating the
SAME row twice still re-syncs the form, patterns commonly get tweaked
by one character between clicks.
- The focus + select call runs in a setTimeout(0) inside the effect, which
lands AFTER React's commit. The row's DropdownMenuContent also passes
`onCloseAutoFocus={(e) => e.preventDefault()}` so Radix's default
restore-focus-to-trigger doesn't yank focus out of the input the
Duplicate flow is sending it to. The trigger is opacity-0 until hover
anyway, so nothing visual is lost by skipping the restoration.
- A `policy_duplicated` analytics event is added to the catalog. It fires
on the prefill, BEFORE the user submits, Duplicate is purely a client-
side form-fill today, with the eventual server write going through the
existing `policies.create` path.
The scenario walks the surface end-to-end: add a Block rule through the
form, click Duplicate on its row, assert the form carries the pattern + the
Block action, assert the pattern input is the document's active element,
edit the pattern to a sibling, submit, see both rows. Server-side
policies.list confirms both rules persisted with the source's action.
Four findings from a separate review pass on the previous two commits, applied here: - Replace `Date.now()` with a monotonic counter for the prefill nonce. Two Duplicate clicks within the same millisecond produced identical nonces and the form's useEffect comparator saw no change, a silent no-op. The functional state updater reads the prior counter so the new value is always strictly greater than the last. - Trim the prefill useEffect's dependency array to the nonce alone. The prefill object identity always changes with the nonce, so depending on both was redundant. Adds an oxlint exhaustive-deps suppression with a comment explaining the closure-over-latest pattern. - Make `policies-duplicate`'s focus assertion value-shaped per e2e/AGENTS.md. Was `.toBe(true)` behind an evaluate returning boolean, a regression printed bare `false` instead of the actual focused element. Now reads `document.activeElement?.id` and asserts on the input's id (`policy-pattern`). - Align `policies-round-trip`'s overflow click with `policies-duplicate`'s hover-first pattern (and add an explicit `dropdown-menu-content` waitFor before targeting the menu item). Removes the last `force: true` click in the diff and matches how a real user reaches the menu. Follow-up flagged in the review (not in this commit): policies-duplicate exercises an org-owned source on an org-only host. A second scenario with a user-owned source on a multi-owner host would close the owner-propagation coverage gap. Verified 4/4 pass on cloud and 4/4 pass on selfhost. oxfmt, oxlint --deny-warnings, and bun run typecheck (e2e + packages/react) are all clean.
policies-duplicate and policies-round-trip both submit a rule through the
add form, then open a dropdown on the resulting row (the overflow menu in
duplicate, the badge select and overflow menu in round-trip). The optimistic
render attaches the row immediately, but Radix's pointer listeners on the
row's triggers only bind cleanly on the post-confirm re-render. A fast
hover+click between the optimistic mount and the confirm landed on a
stale, listener-less trigger and silently no-opped, so the menuitem wait
timed out 30 seconds later. The flake was rare on warm boots (it hit
roughly 1 in 6 batched runs) and never on a single-scenario re-run.
The e2e style guide already calls this out: "After an action that
navigates, wait for the URL/network to settle before opening menus."
The Add policy submit is structurally the same situation. Adds a
`page.waitForLoadState("networkidle")` in both scenarios immediately
after the submit, before the row's dropdown is touched.
Verified across 3 batched cycles on cloud + 3 batched cycles on selfhost
(24 scenario runs, all green), plus the four standalone runs the prior
commit already passed.
🟡 Review pass, Acceptable, MEDIUM riskIndependent re-read of commits Findings1 · 2 · 3 · 4 · 5 · Coverage gap (NOT addressed): owner propagation through Duplicate is only tested implicitly 6 · Pre-existing leak in What was checked
Verdict🟡 Acceptable, MEDIUM risk. The four code findings are real but mechanical; none required changing the product's contract or the test's intent. The coverage gap (5) is a follow-up. The pre-existing leak (6) is out of scope. Produced as a separate review pass on the author's commits, with a fresh reading of the style guide and the diff. Findings 1–4 land in commit |
✅ QA Report, PASSIndependent verification of the fixed code on commit Run summary
Cloud: 4/4 pass, 20.21 s total (9.31 s in tests, rest in dev-server boot + transform). Static gatesThe one pre-existing suggestion in Per-browser-scenario artifacts (cloud)Each browser scenario produces step screenshots, The What was verified
Verdict✅ PASS. Branch Produced from a fresh local run of the harness against the committed code in the same sandbox the author used (same shell, same |
Greptile SummaryThis PR adds a Duplicate row action to the policies page — copying a rule's pattern, action, and owner into the add form with focus + selection on the pattern input — and lands four new E2E scenarios (
Confidence Score: 5/5Safe to merge; the product code is correct and the previous keyboard-navigation concern has been addressed with closeReasonRef. The Duplicate feature implementation is well-structured: the nonce counter prevents same-ms collision, closeReasonRef correctly scopes focus-restoration suppression to Duplicate only, and the form's useEffect deps are clean primitives with no linter suppression. The four scenarios follow the e2e style guide. The one open issue — missing waitForLoadState after the copy submit in policies-duplicate.test.ts — is a test reliability concern on slow selfhost, not a product defect. e2e/scenarios/policies-duplicate.test.ts and e2e/scenarios/policies-round-trip.test.ts both have spots where a waitForLoadState("networkidle") would close a potential race before the server-side assertion or the overflow menu open. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[User clicks Duplicate on row] --> B[handleDuplicate called]
B --> C[setTargetOwner - sync owner]
B --> D["setPrefill - nonce++, pattern, action"]
B --> E[trackEvent policy_duplicated]
D --> F[AddPolicyForm re-renders with new prefill prop]
F --> G["useEffect fires - prefillNonce changed"]
G --> H[setPattern + setAction from prefill]
G --> I[setTimeout 0 - focus + select pattern input]
I --> J[Radix close fires setTimeout 0]
J --> K[closeReasonRef = duplicate - e.preventDefault in onCloseAutoFocus]
K --> L[Pattern input focused with content selected]
L --> M[User tweaks pattern, clicks Add]
M --> N[policies.create - new row appears]
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
A[User clicks Duplicate on row] --> B[handleDuplicate called]
B --> C[setTargetOwner - sync owner]
B --> D["setPrefill - nonce++, pattern, action"]
B --> E[trackEvent policy_duplicated]
D --> F[AddPolicyForm re-renders with new prefill prop]
F --> G["useEffect fires - prefillNonce changed"]
G --> H[setPattern + setAction from prefill]
G --> I[setTimeout 0 - focus + select pattern input]
I --> J[Radix close fires setTimeout 0]
J --> K[closeReasonRef = duplicate - e.preventDefault in onCloseAutoFocus]
K --> L[Pattern input focused with content selected]
L --> M[User tweaks pattern, clicks Add]
M --> N[policies.create - new row appears]
Reviews (2): Last reviewed commit: "Scope onCloseAutoFocus to Duplicate so M..." | Re-trigger Greptile |
The row's dropdown menu had `onCloseAutoFocus={(e) => e.preventDefault()}`
applied to the shared `DropdownMenuContent`, which fires for every item.
That works for Duplicate (which sends focus to the form's pattern input)
and is benign for Remove (the row unmounts, the default refocus is moot
either way), but it drops focus to `<body>` after Move up or Move down,
where the trigger persists in the DOM and a keyboard user needs Radix's
default trigger-refocus to land somewhere useful.
A `closeReasonRef` set inside Duplicate's `onClick` and checked in
`onCloseAutoFocus` lets the suppression scope to the Duplicate case
only, restoring keyboard focus restoration for Move up / Move down.
Also rewrites the `useEffect` that copies the prefill into the form to
include `prefillPattern` and `prefillAction` in its dep array, dropping
the `react-hooks/exhaustive-deps` disable. The parent constructs a new
prefill object on every Duplicate click, so the three primitives always
change together, the effect fires the same number of times, and the new
deps array reads cleanly without the disable comment.
|
Thanks for the review. All three findings looked at, two applied in commit P1, P2, misleading P2, missing Reverting that one change took stability back to 6/6 batched cycles, three per target, with the other two findings still applied. The existing UI assertion that polls the badge Happy to revisit if you have a settle pattern that works on cloud's background traffic. |
This PR is an OpenHands QA-loop submission for @RhysSullivan's autonomous QA idea: the agent made a real product change, converted the behavior into repo E2E scenarios, ran the same scenarios against cloud and selfhost targets, and attached video and trace artifacts so the result can be reviewed from GitHub without cloning locally.
What this adds
A Duplicate row action on the policies page that copies a rule's pattern, action, and owner into the add form, focuses the pattern input with its content selected, and lets a tweaked clone ship in one keystroke. Plus four scenarios in
e2e/scenarios/: three covering existing policies behavior the suite did not yet pin, and one pinning the new Duplicate flow end to end.Watch the harness exercise the new feature
A 17-second recording (E2E_FILM=1 slow-mo, 400 ms per Playwright action) of the
policies-duplicatescenario walking the new Duplicate row action end to end, add a Block rule, click Duplicate on its row, see the form prefill with the source's pattern + action + focus, tweak the pattern to a sibling, see both rows in the list.If the inline player does not render in your client: direct download. The full Playwright trace is also attached, drag it into trace.playwright.dev for the time-travel DOM + network viewer.
How this PR maps to the six requirements
ghCLI, a code editor, the e2e harness's Chromium. Nothing else, no specialized plugins.48cc867badds a Duplicate row action to the policies page (the development). The same commit landspolicies-duplicate.test.ts(the test that pins it). The video above is the test exercising the feature against the running app.e2e/scenarios/and run on cloud + selfhost. Verified 8/8.cli-macos.config.ts/cli-linux/cli-windowsis the obvious follow-up. Out of scope for this PR; the loop's authoring shape is identical.pip install openhands-sdk openhands-tools, point at any repo, hand it that repo'sAGENTS.md, the same loop runs against any target.session.mp4(theE2E_FILM=1slow-mo pacee2e/src/surfaces/browser.tsalready supports), embedded above. The trace.zip carries every DOM mutation.And the meta-requirement, "verify the work without running anything locally, purely by looking at the e2e test and test output": every artifact you need is on this page, the commit diff is the feature, the test source is the spec, the video is the proof, the QA Report comment below is the run. No clone required.
The develop-then-test loop, as four commits
b4b6fa65, Cover the policies surface with lifecycle, landing, and UI round-trip scenarios. Three scenarios for existing behavior, filling gaps the suite did not yet pin:policies-lifecycle.test.ts, typed API CRUD round-trip. The existingpolicies.test.tscovers create + list; this addsupdate(both full-payload and the action-only partial the row badge'shandleUpdateactually sends) andremove, plus an assertion thatcreatereturns a non-emptyposition(the fractional-indexing key the page's sort and reorder math both depend on).policies-landing.test.ts, a fresh workspace renders the explainer empty state with a gated add form. Asserts on thedisabledattribute's value rather than.toBe(false), so a regression that ungates the submit prints the actual element state.policies-round-trip.test.ts, the page is an authoring surface: add a rule through the form, flip its action via the row badge select, remove it via the row's overflow menu, watch the empty state return. Server-sidepolicies.listreads zero rows for the suffix afterwards, so the UI remove isn't optimistic-cache fiction.48cc867b, Add a Duplicate row action to the policies page, pinned by its scenario. The development half. The row's overflow menu went fromMove up / Move down / RemovetoMove up / Move down / Duplicate / Remove. Duplicate copies the source row's pattern + action + owner into the add form, focuses the pattern input with its content selected, lets the user save a tweaked clone in one keystroke.policies-duplicate.test.tslands in the same commit and pins the surface end to end.Mechanics worth flagging:
setPrefill(prev => ...)updater. (The first cut usedDate.now(); the review pass below caught the same-ms collision and the fix lands in5673a592.)DropdownMenuContentpassesonCloseAutoFocus={(e) => e.preventDefault()}so Radix's default trigger-refocus doesn't yank focus out of the input the Duplicate flow is sending it to. The trigger is opacity-0 until hover, so nothing visual is lost.policy_duplicatedanalytics event lands in the catalog. It fires on the prefill (before the user submits), Duplicate is purely a client-side form-fill today, with the eventual server write going through the existingpolicies.createpath.5673a592, Address findings from a separate review pass. Four mechanical findings from a separate read of the previous two commits againste2e/AGENTS.md: the same-ms nonce collision (above); a redundantuseEffectdep; a boolean-shaped focus assertion inpolicies-duplicate(now value-shaped, reads the focused element's id); andpolicies-round-trip's overflow click aligned withpolicies-duplicate's hover-first pattern so the harness uses one mechanic for one UX surface. Full review markdown in the first comment below.4cf09900, Settle the network after Add policy before opening the row's dropdown. A flake found during a stability sweep: when a form-submit lands a new row via optimistic render, Radix's pointer listeners on the row's dropdown triggers only bind cleanly on the post-confirm re-render. A fast hover+click between the optimistic mount and the confirm landed on a stale, listener-less trigger and the dropdown silently no-opped (~1 in 6 batched runs on a cold selfhost). Thee2e/AGENTS.mdstyle guide already prescribes the fix ("settle the network before opening menus"):await page.waitForLoadState("networkidle")between the submit and the next dropdown interaction, in bothpolicies-duplicateandpolicies-round-trip. Verified across 6 batched cycles (24 scenario runs, all green).The develop-then-test loop
Three roles, author, reviewer, verifier, each its own conversation against the OpenHands SDK:
b4b6fa65,48cc867b), readse2e/AGENTS.md, two existing scenarios as style baseline, the API definitions, and the route component. Drafts the four scenarios, builds the Duplicate feature, verifies 8/8 (4 scenarios × 2 targets).code-reviewskill: fresh read of the author's diff against the style guide and the existing surfaces. Surfaces six findings; the four mechanical ones land in commit5673a592. The two non-mechanical ones (a coverage follow-up + a flag on a pre-existing leak inpolicies.test.ts) are noted in the review.qa-changesskill: exercises the fixed code on both targets, captures exit codes + per-scenario timing + the artifact inventory, returns a structured QA Report.OpenHands ships these as composable primitives, the SDK as runtime, skills as role prompts, Automations as cron and webhook triggers for the recurring half. This PR composes them for the proactive direction (agent develops a feature + writes its test in one loop) against the executor harness. The same primitives compose other directions too, a reactive direction (PR opens, agent generates tests) is its own pattern; neither is canonical, both are valid uses of the platform.
Verification
cd e2e bun run test:cloud scenarios/policies-lifecycle.test.ts scenarios/policies-landing.test.ts \ scenarios/policies-round-trip.test.ts scenarios/policies-duplicate.test.ts bun run test:selfhost scenarios/policies-lifecycle.test.ts scenarios/policies-landing.test.ts \ scenarios/policies-round-trip.test.ts scenarios/policies-duplicate.test.ts→ 4/4 pass on cloud, 4/4 pass on selfhost, across 3 cold-boot batched cycles per target.
bunx oxfmt --check,bunx oxlint -c .oxlintrc.jsonc --deny-warnings, andbun run typecheck(full root, 41 packages) are clean.Each browser scenario produces step screenshots,
session.mp4, and atrace.zipundere2e/runs/<target>/<scenario-slug>/.Toolchain
Built on the OpenHands SDK (open source) in a standard sandbox with bash,
gh, a code editor, and the e2e harness's Chromium. The sandbox was handed nothing but the repo URL ande2e/AGENTS.md. The reviewer and verifier roles loaded skills fromOpenHands/extensions(linked above).Open questions for review
b4b6fa65), that is the natural split point.Applies toselect is visible) would close the owner-propagation gap. Happy to add a second scenario in this PR or a follow-up, your call on shape.cli-macos.config.ts/cli-linux/cli-windowsis the obvious follow-up, the same authoring shape applies. Out of scope here unless you want it pulled in.e2e/scenarios/policies.test.tscreates a policy with the static patternpolicies-scn.*and never removes it (selfhost leak). Not in this diff; flagging in case you want it fixed separately.