feat(behavioral): canonical dangerous-callable resolver + shared evasion corpus (implements #181)#182
Conversation
| return None | ||
|
|
||
|
|
||
| def _strip_builtins_prefix(name: str) -> str: |
There was a problem hiding this comment.
[Automated SkillSpector Review]
This helper (and the other common.py changes plus the duplicated test_behavioral_ast.py / test_behavioral_taint_tracking.py additions in this PR) come from #180, which this branch is stacked on. Once #180 merges, please rebase onto main so this PR's diff contains only the new canonical-sink layer. No change needed to the code itself.
rng1995
left a comment
There was a problem hiding this comment.
[Automated SkillSpector Review]
Requesting changes — scope/rebase, not the implementation.
To be clear up front: the canonical-sink layer itself is excellent and I have no correctness or security objection to canonical_sink.py. The single-chokepoint design (resolve_to_canonical_sink covering bare/alias/from/as, builtins.*, dynamic import, reflective getattr, subscript __builtins__["exec"]/os.__dict__[...], and importlib/runpy/code sibling machinery with receiver-gated instance methods), the AST9-parity for reflective subprocess, the _covered_by_ast9 anti-double-fire guard, and the shared evasion-corpus fitness gate are all well-reasoned and thoroughly tested.
The blocker is scope/mergeability: this PR is stacked on #180 and its diff currently includes #180's commit 8bdc3bc (the common.py helpers and the test_behavioral_ast.py / test_behavioral_taint_tracking.py additions are duplicated from #180). As written, this can't be merged independently without pulling in / conflicting with #180.
Requested change (matches your own stated plan): land #180 first, then rebase this onto main so the diff contains only the canonical-sink layer (canonical_sink.py, the behavioral_ast/behavioral_taint_tracking wiring, and tests/evasion_corpus/). I'll re-review and approve once the diff is isolated to that.
…ness gate
Introduce resolve_to_canonical_sink, a single chokepoint reducing every spelling
of a dangerous callable -- bare/alias/from/as, builtins.*,
importlib.import_module, reflective getattr("lit"), subscript __builtins__["exec"]
and <mod>.__dict__["lit"], plus importlib/runpy/code sibling machinery -- to one
canonical sink id. Wire it into behavioral_ast and behavioral_taint_tracking so
the per-idiom ladders stay simple, with a shared parametrized evasion corpus
asserting the invariant as a fitness gate: a future missed spelling fails a
shared test instead of shipping as a silent blind spot.
Reflective subprocess invocations grade AST9-HIGH for parity with reflective
os.system; direct subprocess.* keeps baseline AST4-MEDIUM; AST9 and the canonical
fallback stay complementary (the _covered_by_ast9 guard prevents double-firing).
Benign neighbours never canonicalize to a sink.
Stacked on the builtins/importlib sink-evasion fix (reuses
resolve_dynamic_import_call); to be rebased onto main once that lands.
108 evasion-corpus tests pass; full suite +108 with zero regressions
(ruff 0.15.2 + mypy clean).
Signed-off-by: Zied Jlassi <6190550+zied-jlassi@users.noreply.github.com>
ae6c9f1 to
4163d57
Compare
|
Hi @rng1995 — thanks for merging #180 into As requested, I've rebased this branch onto Re-validated in a clean |
rng1995
left a comment
There was a problem hiding this comment.
The previous stacked-PR scope issue is resolved; this is now a clean, focused diff. Two resolver blockers remain: arbitrary .loader.exec_module(...) calls are falsely classified as dynamic imports, while aliased code.InteractiveInterpreter calls evade detection. Please require importlib receiver provenance, normalize aliases for code-interpreter receivers, and add production-wiring plus corpus coverage for both cases.
| """ | ||
| if isinstance(receiver, ast.Attribute): | ||
| # ``<x>.loader`` — the ModuleSpec.loader protocol attribute. | ||
| if receiver.attr == "loader": |
There was a problem hiding this comment.
Blocking false positive: any attribute receiver ending in .loader is accepted without proving it is an importlib ModuleSpec loader. plugin.loader.exec_module(mod) therefore canonicalizes to import and emits AST3 for an arbitrary user object. Require importlib-backed provenance and add this shape to the negative corpus.
| """ | ||
| inferred = None | ||
| if isinstance(receiver, ast.Call): | ||
| inferred = resolve_dotted_name(receiver.func) |
There was a problem hiding this comment.
Blocking evasion: direct constructor names are compared before alias normalization. 'import code as c; c.InteractiveInterpreter().runsource(x)' resolves to None, so a dangerous code-exec sink is missed. Thread aliases through sibling-method resolution or apply import-alias normalization, and test module/from-import aliases.
…alize code interpreter Addresses the NVIDIA#182 re-review: two resolver precision bugs. exec_module false positive: the `<x>.loader` branch matched ANY object exposing a `.loader` attribute, so an unrelated `self.widget.loader.exec_module(...)` was wrongly classified as a dynamic import. The base must now have importlib provenance -- a spec from importlib.util.module_from_spec / spec_from_loader carried through the type map, or an importlib-rooted dotted name -- via the new `_expr_is_importlib_rooted` helper. The legitimate `spec.loader.exec_module(mod)` chain still fires. code-interpreter false negative: `_receiver_is_code_interpreter` never alias-normalized the inferred constructor type, so `import code as c; c.InteractiveInterpreter().runsource(...)` evaded detection. The inferred name now runs through `apply_import_aliases`, which also covers `from code import InteractiveInterpreter`. Production wiring: `behavioral_ast` now threads import aliases into `canonical_sibling_method` (the taint analyzer already did via `resolve_to_canonical_sink`), so both detection paths apply the gate. Corpus: +2 positive rows (aliased code interpreter, aliased importlib spec loader) and +1 false-positive neighbour (arbitrary `.loader.exec_module`), all exercised end-to-end through the real analyzers as a red->green oracle. Signed-off-by: Zied Jlassi <6190550+zied-jlassi@users.noreply.github.com>
|
Thanks @rng1995 — both resolver blockers are addressed in 1. 2. Production wiring. Corpus coverage (run end-to-end through the real analyzers, not just the resolver):
Verified red→green: on the pre-fix code those three rows fail (the FP neighbour produces the AST3 finding you flagged; the aliased interpreter is missed); on Validation (clean |
Implements the proposal in #181: a single canonicalization chokepoint that collapses every spelling of a dangerous callable to one canonical sink id, paired with a shared evasion corpus wired as a fitness gate.
I've opened this as a concrete PR (rather than only waiting on a yes/no in #181) so the approach can be judged on real, tested code — happy to adjust the direction or close if you'd prefer.
This branch is stacked on #180 (
fix: detect builtins.* and importlib.import_module sink evasions): the canonical resolver reusesresolve_dynamic_import_call/_strip_builtins_prefixintroduced there. Becausemaindoes not yet contain #180, this PR's diff currently includes #180's commit8bdc3bc(thecommon.pyhelpers + its behavioral tests). Once #180 lands I'll rebase ontomainso only the canonical layer remains. Suggested merge order: #180 first, then this.What this PR adds on top of #180
canonical_sink.py—resolve_to_canonical_sink(node, aliases, type_map): one chokepoint reducing bare / alias /from/as,builtins.*,importlib.import_module, reflectivegetattr("lit"), subscript__builtins__["exec"]/<mod>.__dict__["lit"], and importlib / runpy / code sibling machinery to a single canonical sink id, reusing the existing alias / type-map / dynamic-import machinery.behavioral_ast+behavioral_taint_trackingwired to call it. Reflectivesubprocessinvocations grade AST9-HIGH (parity with reflectiveos.system); a directsubprocess.*call keeps baseline AST4-MEDIUM; a_covered_by_ast9guard keeps AST9 and the canonical fallback complementary so they never double-fire. Benign neighbours never canonicalize to a sink.tests/evasion_corpus/— shared parametrized corpus as a fitness gate: every spelling → its canonical id and a real production sink rule; FP-neighbours → no sink. Adding a sink becomes "add corpus rows", not "add a detector branch + hope it's complete".Scope / non-goals
In scope: statically-resolvable spellings. Out of scope (documented, left to the generic reflective-access AST7-LOW rule): genuinely dynamic idioms — non-literal
getattr(os, name), variable subscript key, computed attribute string, module name from a variable, local-variable indirection (f = os.system; f(x)). These have no static canonical id and would need def-use / taint tracking; kept explicitly as non-goals so the layer doesn't oversell completeness.Validation (real analyzers + repo toolchain, isolated container)
test_static_patterns::TestRunStaticPatternsAgentSnooping::test_no_same_line_duplicatefailure is unrelated and present on the base tooruff==0.15.2check +ruff format+mypyall cleanAI disclosure: prepared with AI assistance; every line was reviewed and validated by me, and the results above come from running the repo's real analyzers + test toolchain.