Skip to content

feat(checkpatch): feed checkpatch.pl findings into review as opt-in ground truth#22

Open
nethum529 wants to merge 2 commits into
NVIDIA:mainfrom
nethum529:feat/issue-6-checkpatch
Open

feat(checkpatch): feed checkpatch.pl findings into review as opt-in ground truth#22
nethum529 wants to merge 2 commits into
NVIDIA:mainfrom
nethum529:feat/issue-6-checkpatch

Conversation

@nethum529

Copy link
Copy Markdown
Contributor

Intent

Implement NVIDIA/boro issue #6: feed scripts/checkpatch.pl output into the agentic review as opt-in, kernel-only ground truth. New src/checkpatch.rs stage runs checkpatch with --no-tree on each reviewed commit's patch and injects its ERROR/WARNING-level findings into the model's reference context as a new '# --- checkpatch ---' block in prompts::build_reference_context, deliberately mirroring the existing lei/lore upstream-followup handling. Config via CheckpatchConfig::from_env: BORO_CHECKPATCH_ENABLED (default OFF, opt-in, parsed case-insensitively) and BORO_CHECKPATCH_MAX_BYTES (default 16384). Deliberate design decisions a diff-only reviewer would not know: (1) checkpatch exits non-zero precisely when it has findings, so exit status is intentionally ignored for the findings path and stdout is captured regardless; run_checkpatch returns a CheckpatchOutcome enum (Findings/Clean/Failed) so a non-zero exit with no findings is surfaced as Failed and logged, letting an opted-in user tell a broken integration from a clean commit rather than silently no-op; (2) availability is a per-tree scripts/checkpatch.pl file check, not a PATH OnceLock, because it is a tree file; perl presence is the process-global OnceLock; (3) additive only, no pipeline restructuring, kernel-only for now (QEMU/sparse are future follow-ups per the issue); (4) findings injected as reference context, not as deterministic findings, exactly as the issue asked. Also fixed a latent missing 'used += block.len()' on the upstream-followup block so the new checkpatch budget check is accurate. The document pipeline step is intentionally skipped because README docs are already included in the commit and prior runs generated stray scratch commits. Already validated locally: 348 tests pass, cargo fmt --check clean, default clippy clean on new code, and it passed a rust review plus a 4-lens scrutinize plus a confirmation pass.

What Changed

  • Added a src/checkpatch.rs stage that runs the kernel tree's scripts/checkpatch.pl with --no-tree over each reviewed commit's format-patch mbox, gated behind BORO_CHECKPATCH_ENABLED (default off) and BORO_CHECKPATCH_MAX_BYTES (default 16384); run_checkpatch ignores exit status for the findings path and returns a CheckpatchOutcome enum (Findings/Clean/Failed) so a broken integration is surfaced rather than silently no-op'd.
  • Extended src/prompts.rs to inject filtered ERROR/WARNING-level checkpatch output into build_reference_context as a new # --- checkpatch --- reference block, mirroring the existing upstream-followup handling, and fixed a missing used += block.len() budget accounting for that block.
  • Added src/git.rs format-patch/mbox helpers and wired the stage into src/main.rs, with README documentation for the new opt-in configuration.

Risk Assessment

✅ Low: The change is a well-bounded, opt-in, additive fix that correctly routes a format-patch mbox into checkpatch with full graceful-degradation and strong test coverage; it properly addresses the prior round's finding with no new defects.

Testing

Ran the checkpatch unit tests and the full 349-test suite (all green), then demonstrated the feature end-to-end by running the actual boro pipeline (format_patch -> run_checkpatch -> build_reference_context) with the real upstream checkpatch.pl on a real kernel commit carrying planted style violations; the resulting reference context contained the # --- checkpatch --- block with the ERROR/WARNING findings and the authoritative-ground-truth instruction wrapper, CHECK-level and summary lines correctly filtered out. This is the model-facing surface the issue asked for. No product-level UI is involved (this is a CLI/agentic-review data-injection change), so the evidence is a captured reference-context transcript rather than a screenshot. No failures or flakiness found; worktree left clean.

Evidence: Injected reference-context block from real checkpatch.pl (E2E)

# --- checkpatch --- scripts/checkpatch.pl reported the following error- and warning-level findings on this commit. Treat them as authoritative style/correctness ground truth rather than re-deriving them from the diff: /tmp/.tmpWDwIVN:21: WARNING: Missing or malformed SPDX-License-Identifier tag in line 1 /tmp/.tmpWDwIVN:25: ERROR: spaces required around that '=' (ctx:VxV) /tmp/.tmpWDwIVN:26: WARNING: Missing a blank line after declarations /tmp/.tmpWDwIVN:26: ERROR: space required before the open parenthesis '(' /tmp/.tmpWDwIVN:28: WARNING: line length of 107 exceeds 100 columns

E2E_CFG enabled=true max_bytes=16384
E2E_REFERENCE_BLOCK_BEGIN
# --- checkpatch ---

`scripts/checkpatch.pl` reported the following error- and warning-level findings on this commit. Treat them as authoritative style/correctness ground truth rather than re-deriving them from the diff:

/tmp/.tmpWDwIVN:16: WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
/tmp/.tmpWDwIVN:21: WARNING: Missing or malformed SPDX-License-Identifier tag in line 1
/tmp/.tmpWDwIVN:25: ERROR: spaces required around that '=' (ctx:VxV)
/tmp/.tmpWDwIVN:26: WARNING: Missing a blank line after declarations
/tmp/.tmpWDwIVN:26: ERROR: space required before the open parenthesis '('
/tmp/.tmpWDwIVN:28: WARNING: line length of 107 exceeds 100 columns

E2E_REFERENCE_BLOCK_END
Evidence: E2E reproduction script (kernel-style tree + violating commit)
#!/usr/bin/env bash
set -e
EV=/tmp/no-mistakes-evidence/01KWG0423A81FDKV4B9G4920E7
TREE=$EV/kernel-tree
rm -rf "$TREE"
mkdir -p "$TREE/scripts" "$TREE/drivers/foo"
cp "$EV/checkpatch.pl" "$TREE/scripts/checkpatch.pl"
: > "$TREE/scripts/spelling.txt"

git -C "$TREE" init -q
git -C "$TREE" config user.email test@example.com
git -C "$TREE" config user.name Test
git -C "$TREE" config commit.gpgsign false
git -C "$TREE" commit -q --allow-empty -m base

printf '#include <linux/module.h>\n\nint foo_init(void)\n{\n\tint x=1;\n\tif(x)\n\t\treturn x;\n\treturn this_is_a_deliberately_extremely_long_line_that_exceeds_the_eighty_column_limit_for_sure(x);\n}\n' > "$TREE/drivers/foo/foo.c"
git -C "$TREE" add drivers/foo/foo.c
git -C "$TREE" commit -q -m "drivers/foo: add foo_init with style issues

This commit intentionally introduces checkpatch violations so the
reviewer checkpatch ground-truth stage has something to report.

Signed-off-by: Test <test@example.com>"

echo "=== SHA ==="
git -C "$TREE" rev-parse --short HEAD
echo "=== format-patch mbox (head) ==="
git -C "$TREE" format-patch -1 --stdout HEAD | head -20

Pipeline

Updates from git push no-mistakes

✅ **intent** - passed

✅ No issues found.

✅ **Rebase** - passed

✅ No issues found.

🔧 **Review** - 1 issue found → auto-fixed ✅
  • ⚠️ src/main.rs:3183 - run_checkpatch is fed git::show_patch output (git show --pretty=medium), which begins with commit &lt;sha&gt;/Author:/Date: headers and indents the commit message by 4 spaces. checkpatch.pl expects git format-patch / mbox input. Diff-level findings stay correct (line numbers come from the @@ hunk headers), but commit-log-level checks run over the git-show header lines and the +4-space-indented body, which can produce spurious ERROR/WARNING findings (e.g. 'Possible unwrapped commit description') or silently skip Subject:-dependent checks. Because render_summary labels the output as 'authoritative ground truth', inaccurate commit-log findings can mislead the reviewer model. Consider feeding format-patch/mbox output instead.

🔧 Fix: feed checkpatch a format-patch mbox instead of git show
✅ Re-checked - no issues remain.

✅ **Test** - passed

✅ No issues found.

  • cargo test checkpatch (20 checkpatch/prompts unit tests: parse_enabled case-insensitivity, max_bytes parsing, is_active gating, Findings/Clean/Failed outcomes, truncation-with-marker, filter keeps only ERROR/WARNING)
  • cargo test full suite: 349 passed, 0 failed
  • E2E: built a kernel-style tree with the real upstream scripts/checkpatch.pl, committed drivers/foo/foo.c with deliberate style violations, then drove git::format_patch -> checkpatch::run_checkpatch -> prompts::build_reference_context via a temporary in-crate test and captured the rendered # --- checkpatch --- reference block (temp test reverted afterward; git diff HEAD clean)
⏭️ **Document** - skipped

Step was skipped.

✅ **Lint** - passed

✅ No issues found.

✅ **Push** - passed

✅ No issues found.

nethum529 added 2 commits July 1, 2026 18:23
The reviewer inferred style and obvious-correctness issues from the diff
text alone, guessing at findings that scripts/checkpatch.pl already
produces deterministically. That ground truth never reached the model.

Add an opt-in, kernel-only stage that runs scripts/checkpatch.pl on each
reviewed commit and injects its error- and warning-level findings into the
reviewer's reference context as a "# --- checkpatch ---" block, the same
way lore follow-up data is appended. checkpatch is run with --no-tree on
the commit patch, so its output does not depend on the worktree's
checked-out state, and its non-zero exit on findings is captured rather
than treated as failure.

The stage mirrors the lei/lore handling: it is configured from
BORO_CHECKPATCH_ENABLED (default off) and BORO_CHECKPATCH_MAX_BYTES, and
degrades gracefully to a no-op when the script or perl is absent, so every
other stage still runs. It is additive; no pipeline restructuring.

Closes NVIDIA#6

Signed-off-by: nethum529 <nethumweerasinghe.nw@gmail.com>
…it show

Signed-off-by: nethum529 <nethumweerasinghe.nw@gmail.com>
@nethum529 nethum529 force-pushed the feat/issue-6-checkpatch branch from 2a235dc to b4d520a Compare July 3, 2026 20:33
@arighi

arighi commented Jul 3, 2026

Copy link
Copy Markdown
Collaborator

Can you provide a better description for commit no-mistakes(review): feed checkpatch a format-patch mbox instead of git show and drop the "no-mistakes"? Thanks!

@arighi

arighi commented Jul 3, 2026

Copy link
Copy Markdown
Collaborator

I also see a potential issue with this approach: if we execute ./scripts/checkpatch.pl from the worktree, a malicious commit could modify checkpatch.pl to run arbitrary code. I'm not sure exactly how to address this... should we always run checkpatch.pl from a known branch (latest Linus git)?

@nethum529

Copy link
Copy Markdown
Contributor Author

I also see a potential issue with this approach: if we execute ./scripts/checkpatch.pl from the worktree, a malicious commit could modify checkpatch.pl to run arbitrary code. I'm not sure exactly how to address this... should we always run checkpatch.pl from a known branch (latest Linus git)?

So with checkpatch on, review checks the commit out and runs that tree's own
scripts/checkpatch.pl under perl. This means a bad commit ships its own checkpatch.pl
and runs code as the boro user, just from being reviewed.

git worktree add --detach <sha> -> perl <that tree>/scripts/checkpatch.pl

Pinning checkpatch is needed but not enough: it still runs with cwd in the
untrusted tree, so the attacker controls spelling.txt and .checkpatch.conf.
The minimum fix would be something like this: pin/bundle checkpatch AND run from a neutral cwd with --no-conf +
absolute patch path. Sandbox the perl on top for the safe version.

Let me know how you'd like me to proceed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants