fix(analyzer): reduce instructional-prose false positives in static scans (#103)#232
fix(analyzer): reduce instructional-prose false positives in static scans (#103)#232rodboev wants to merge 15 commits into
Conversation
…VIDIA#103) Signed-off-by: Rod Boev <rod.boev@gmail.com>
…NVIDIA#103) Signed-off-by: Rod Boev <rod.boev@gmail.com>
…IDIA#103) Signed-off-by: Rod Boev <rod.boev@gmail.com>
…irectives (NVIDIA#103) Signed-off-by: Rod Boev <rod.boev@gmail.com>
…ives (NVIDIA#103) Signed-off-by: Rod Boev <rod.boev@gmail.com>
Signed-off-by: Rod Boev <rod.boev@gmail.com>
rng1995
left a comment
There was a problem hiding this comment.
Requesting changes because the new false-positive guards create straightforward anti-refusal detection bypasses. Schema keywords, declaration/tool labels, and a benign narrative clause can each mask live model-facing directives. Scope suppression to the matched clause and demonstrably quoted defensive examples, then add mixed benign/malicious regression cases.
…A#103) Signed-off-by: Rod Boev <rod.boev@gmail.com>
|
Thanks, I agree the previous guard was still too broad. This update scopes the benign check to the matched clause instead of the whole line, keeps the schema-field suppression only when that AR2 clause itself targets schema fields, and drops the blanket declaration and tool-label allowlist so bare |
Summary
--no-llmstatic scans currently over-fire on quoted defensive examples, schema-field warning prose, and layout-only content. This narrows anti-refusal suppression to match-local quoted defensive examples or schema-field clauses, keeps live tool-description and mixed-clause directives detectable, and filters MP2 layout-only spans that carry no semantic stuffing content.Closes #103
Attribution: issue follow-up from @M8seven on 2026-06-25 sharpened the surviving scope with the whitespace and box-drawing MP2 repro plus the
Never skip the corpus check warningprose case.Root cause
static_patterns_anti_refusal.pyonly had line-wide benign heuristics, so unrelated schema tokens, declaration labels, and narrative clauses could suppress later live directives on the same line.static_patterns_memory_poisoning.pyfilters only one narrow repeated-capture case, so whitespace and box-drawing layout can still emitContext Window Stuffing.Diff Notes
tool:anddescription:content model-facing; they now stay detectable unless the matched phrase is quoted and explicitly framed as a defensive example.Scope
This stays in the analyzer layer. It does not change prompt-injection logic, CLI behavior, graph orchestration, report or SARIF schemas, provider code, or LLM-side mitigation.
Verification
./.venv/Scripts/python.exe -m pytest tests/nodes/analyzers/test_static_patterns_anti_refusal.py tests/nodes/analyzers/test_static_patterns.pyuv run ruff check src/ tests/uv run ruff format --check src/ tests/