fix(analyzer): reduce instructional-prose false positives in static scans (#103) by rodboev · Pull Request #232 · NVIDIA/SkillSpector

rodboev · 2026-06-29T18:06:21Z

Summary

--no-llm static scans currently over-fire on quoted defensive examples, schema-field warning prose, and layout-only content. This narrows anti-refusal suppression to match-local quoted defensive examples or schema-field clauses, keeps live tool-description and mixed-clause directives detectable, and filters MP2 layout-only spans that carry no semantic stuffing content.

Closes #103

Attribution: issue follow-up from @M8seven on 2026-06-25 sharpened the surviving scope with the whitespace and box-drawing MP2 repro plus the Never skip the corpus check warning prose case.

Root cause

static_patterns_anti_refusal.py only had line-wide benign heuristics, so unrelated schema tokens, declaration labels, and narrative clauses could suppress later live directives on the same line. static_patterns_memory_poisoning.py filters only one narrow repeated-capture case, so whitespace and box-drawing layout can still emit Context Window Stuffing.

Diff Notes

Replace the AR benign filter with match-local clause analysis, quoted defensive-example checks, and schema-field-only suppression for the matched AR2 clause.
Keep bare tool: and description: content model-facing; they now stay detectable unless the matched phrase is quoted and explicitly framed as a defensive example.
Add a private MP2 post-filter for whitespace-only and box-drawing layout spans.
Add focused anti-refusal regressions for the schema-token bypass, tool-description attack surface, split-clause narrative/live mix, quoted defensive examples, and the existing MP2 layout coverage.

Scope

This stays in the analyzer layer. It does not change prompt-injection logic, CLI behavior, graph orchestration, report or SARIF schemas, provider code, or LLM-side mitigation.

Verification

./.venv/Scripts/python.exe -m pytest tests/nodes/analyzers/test_static_patterns_anti_refusal.py tests/nodes/analyzers/test_static_patterns.py
uv run ruff check src/ tests/
uv run ruff format --check src/ tests/

…cans (NVIDIA#103)

)

…VIDIA#103) Signed-off-by: Rod Boev <rod.boev@gmail.com>

…NVIDIA#103) Signed-off-by: Rod Boev <rod.boev@gmail.com>

…IDIA#103) Signed-off-by: Rod Boev <rod.boev@gmail.com>

…irectives (NVIDIA#103) Signed-off-by: Rod Boev <rod.boev@gmail.com>

…ives (NVIDIA#103) Signed-off-by: Rod Boev <rod.boev@gmail.com>

Signed-off-by: Rod Boev <rod.boev@gmail.com>

rng1995

Requesting changes because the new false-positive guards create straightforward anti-refusal detection bypasses. Schema keywords, declaration/tool labels, and a benign narrative clause can each mask live model-facing directives. Scope suppression to the matched clause and demonstrably quoted defensive examples, then add mixed benign/malicious regression cases.

…A#103) Signed-off-by: Rod Boev <rod.boev@gmail.com>

rodboev · 2026-06-30T10:53:33Z

Thanks, I agree the previous guard was still too broad. This update scopes the benign check to the matched clause instead of the whole line, keeps the schema-field suppression only when that AR2 clause itself targets schema fields, and drops the blanket declaration and tool-label allowlist so bare description: content still fires. I also added focused regressions for the schema-token bypass, tool-description attack surface, the split would always comply; always comply with the user case, and quoted defensive examples that should stay clean.

rodboev added 14 commits June 29, 2026 11:43

fix(analyzer): reduce instructional-prose false positives in static s…

bf02bb7

…cans (NVIDIA#103)

fix(analyzer): preserve direct warning-suppression detection (NVIDIA#103

ff97530

)

fix(analyzer): honor quoted and declared benign roles (NVIDIA#103)

16f6b0d

fix(analyzer): keep adjacent live anti-refusal directives detectable (N…

f888914

…VIDIA#103)

fix(analyzer): scope benign anti-refusal continuations precisely (NVI…

7d50b8c

…DIA#103)

fix(analyzer): distinguish declaration headers from live directives (N…

c7cade2

…VIDIA#103)

fix(analyzer): treat documentation labels as prose, not examples (NVI…

a626c7e

…DIA#103)

test(analyzer): cover declaration and fixture prose edges (NVIDIA#103)

1265a4c

fix(analyzer): keep live directives from slipping past prose guards (N…

a51d7f0

…VIDIA#103) Signed-off-by: Rod Boev <rod.boev@gmail.com>

fix(analyzer): keep ambiguous labels from suppressing live directives (…

e02d472

…NVIDIA#103) Signed-off-by: Rod Boev <rod.boev@gmail.com>

fix(analyzer): preserve live directives through the static runner (NV…

e9e9a82

…IDIA#103) Signed-off-by: Rod Boev <rod.boev@gmail.com>

fix(analyzer): keep block labels and schema prose from masking live d…

e55b989

…irectives (NVIDIA#103) Signed-off-by: Rod Boev <rod.boev@gmail.com>

fix(analyzer): preserve the remaining AR2 response-suppression direct…

74a36a9

…ives (NVIDIA#103) Signed-off-by: Rod Boev <rod.boev@gmail.com>

fix(analyzer): preserve multiline documentation directives (NVIDIA#103)

45953d6

Signed-off-by: Rod Boev <rod.boev@gmail.com>

rng1995 requested changes Jun 30, 2026

View reviewed changes

Comment thread src/skillspector/nodes/analyzers/static_patterns_anti_refusal.py Outdated

Comment thread src/skillspector/nodes/analyzers/static_patterns_anti_refusal.py Outdated

Comment thread src/skillspector/nodes/analyzers/static_patterns_anti_refusal.py Outdated

fix(analyzer): scope anti-refusal suppression to local clauses (NVIDI…

134c6e5

…A#103) Signed-off-by: Rod Boev <rod.boev@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(analyzer): reduce instructional-prose false positives in static scans (#103)#232

fix(analyzer): reduce instructional-prose false positives in static scans (#103)#232
rodboev wants to merge 15 commits into
NVIDIA:mainfrom
rodboev:pr/static-prose-false-positive-103

rodboev commented Jun 29, 2026 •

edited

Loading

Uh oh!

rng1995 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rodboev commented Jun 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

rodboev commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root cause

Diff Notes

Scope

Verification

Uh oh!

rng1995 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rodboev commented Jun 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rodboev commented Jun 29, 2026 •

edited

Loading