Skip to content

fix(analyzer): reduce instructional-prose false positives in static scans (#103)#232

Open
rodboev wants to merge 15 commits into
NVIDIA:mainfrom
rodboev:pr/static-prose-false-positive-103
Open

fix(analyzer): reduce instructional-prose false positives in static scans (#103)#232
rodboev wants to merge 15 commits into
NVIDIA:mainfrom
rodboev:pr/static-prose-false-positive-103

Conversation

@rodboev

@rodboev rodboev commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Summary

--no-llm static scans currently over-fire on quoted defensive examples, schema-field warning prose, and layout-only content. This narrows anti-refusal suppression to match-local quoted defensive examples or schema-field clauses, keeps live tool-description and mixed-clause directives detectable, and filters MP2 layout-only spans that carry no semantic stuffing content.

Closes #103

Attribution: issue follow-up from @M8seven on 2026-06-25 sharpened the surviving scope with the whitespace and box-drawing MP2 repro plus the Never skip the corpus check warning prose case.

Root cause

static_patterns_anti_refusal.py only had line-wide benign heuristics, so unrelated schema tokens, declaration labels, and narrative clauses could suppress later live directives on the same line. static_patterns_memory_poisoning.py filters only one narrow repeated-capture case, so whitespace and box-drawing layout can still emit Context Window Stuffing.

Diff Notes

  • Replace the AR benign filter with match-local clause analysis, quoted defensive-example checks, and schema-field-only suppression for the matched AR2 clause.
  • Keep bare tool: and description: content model-facing; they now stay detectable unless the matched phrase is quoted and explicitly framed as a defensive example.
  • Add a private MP2 post-filter for whitespace-only and box-drawing layout spans.
  • Add focused anti-refusal regressions for the schema-token bypass, tool-description attack surface, split-clause narrative/live mix, quoted defensive examples, and the existing MP2 layout coverage.

Scope

This stays in the analyzer layer. It does not change prompt-injection logic, CLI behavior, graph orchestration, report or SARIF schemas, provider code, or LLM-side mitigation.

Verification

  • ./.venv/Scripts/python.exe -m pytest tests/nodes/analyzers/test_static_patterns_anti_refusal.py tests/nodes/analyzers/test_static_patterns.py
  • uv run ruff check src/ tests/
  • uv run ruff format --check src/ tests/

rodboev added 14 commits June 29, 2026 11:43
…irectives (NVIDIA#103)

Signed-off-by: Rod Boev <rod.boev@gmail.com>
…ives (NVIDIA#103)

Signed-off-by: Rod Boev <rod.boev@gmail.com>

@rng1995 rng1995 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting changes because the new false-positive guards create straightforward anti-refusal detection bypasses. Schema keywords, declaration/tool labels, and a benign narrative clause can each mask live model-facing directives. Scope suppression to the matched clause and demonstrably quoted defensive examples, then add mixed benign/malicious regression cases.

Comment thread src/skillspector/nodes/analyzers/static_patterns_anti_refusal.py Outdated
Comment thread src/skillspector/nodes/analyzers/static_patterns_anti_refusal.py Outdated
Comment thread src/skillspector/nodes/analyzers/static_patterns_anti_refusal.py Outdated
@rodboev

rodboev commented Jun 30, 2026

Copy link
Copy Markdown
Contributor Author

Thanks, I agree the previous guard was still too broad. This update scopes the benign check to the matched clause instead of the whole line, keeps the schema-field suppression only when that AR2 clause itself targets schema fields, and drops the blanket declaration and tool-label allowlist so bare description: content still fires. I also added focused regressions for the schema-token bypass, tool-description attack surface, the split would always comply; always comply with the user case, and quoted defensive examples that should stay clean.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

False positives: --no-llm static pass flags documentation/teaching content as vulnerabilities (polarity-blind matching)

2 participants