Add prompt injection hardening across all data paths#42
Conversation
Defense-in-depth sanitization layer for tool parameters, error messages, audit logs, TTY display, and instruction files. - New InputSanitizer utility: stripControlChars, sanitizeForPrompt, sanitizeToolParams, escapeMarkdown, truncateForDisplay - Fix confirmation tokens to use crypto.randomBytes instead of Math.random - Sanitize tool params in Arbitrator TTY display (prevents ANSI injection) - Sanitize IrreversibilityScorer summary (source of actionSummary) - Sanitize error messages sent back to the LLM in channel mode - Sanitize DecisionLog JSONL output (strip control chars) - Validate ToolShield experience JSON: type checking, injection pattern detection, markdown stripping, tool name sanitization - Fix missing _escape_braces in exp_generate.py .format() calls Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Important
Looks good to me! 👍
Reviewed everything up to 7970355 in 19 seconds. Click for details.
- Reviewed
418lines of code in9files - Skipped
0files when reviewing. - Skipped posting
0draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
Workflow ID: wflow_paZJO0xSasCsVRdL
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
|
Thanks LGTM |
nanookclaw
left a comment
There was a problem hiding this comment.
Solid hardening pass. A few observations from reading the diff:
The asymmetry between TS and Python is intentional and correct. TypeScript side replaces injection patterns with [FILTERED] (soft filter for display/logging), while Python side raises ValueError on detection (hard reject for experience import). Different threat models: display sanitization is best-effort, but data ingestion into instruction files should fail-closed. Worth documenting this distinction explicitly — a future contributor might try to "align" them and weaken the Python gate.
The injection pattern list is a good starting point but inherently incomplete. Regex-based detection is defense-in-depth, not a perimeter. The real protection here is the combination of sanitization + truncation + the existing human-in-the-loop approval from the Arbitrator. Worth adding a code comment in InputSanitizer.ts noting that this list should be treated as a living document — new patterns should be added as they're discovered in the wild rather than attempting exhaustive coverage upfront.
crypto.randomBytes() over Math.random() for confirmation tokens is the right call. One thing: the current implementation generates 3 bytes → 6 hex chars. That's 16^6 = ~16.7M possible tokens. For a confirmation token that lives for seconds during an interactive approval, this is plenty. But if ClawReins ever supports async/remote confirmation (where tokens persist longer), consider bumping to 4+ bytes.
The _escape_braces fixes in exp_generate.py are the most impactful change here. Those were real bugs — unescaped curly braces in user content passed to .format() would cause KeyError or silent data corruption. Nice catch.
Minor: truncateForDisplay handles high surrogate splitting but doesn't check for split inside a grapheme cluster (e.g., emoji with ZWJ). Probably fine for the display/logging context, but worth a comment noting this is character-safe, not grapheme-safe.
Summary
crypto.randomBytes()instead ofMath.random()_escape_braces()in two.format()calls inexp_generate.pyFiles changed (9)
src/core/InputSanitizer.ts— new sanitization utilitysrc/core/Interceptor.ts— crypto.randomBytes tokens + sanitize error messagessrc/core/Arbitrator.ts— sanitize TTY args displaysrc/core/IrreversibilityScorer.ts— strip control chars from summarysrc/plugin/tool-interceptor.ts— sanitize actionSummary assignmentssrc/storage/DecisionLog.ts— strip control chars from JSONLsrc/core/toolshield/toolshield/cli.py— validate experiences + sanitize tool namessrc/core/toolshield/toolshield/exp_generate.py— fix _escape_braces gapssrc/index.ts— export sanitization utilitiesTest plan
npm run buildpasses[FILTERED]in error messagestoolshield import --exp-file <existing>.json --agent openclaw→ verify existing experiences import correctly"## NEW INSTRUCTIONS:"→ verify rejected by validator🤖 Generated with Claude Code
Important
Enhances security by adding input sanitization to prevent prompt injection, updates token generation, and fixes formatting issues across multiple components.
InputSanitizer.tsfor stripping ANSI escape codes, prompt injection markers, control characters, and markdown syntax.Math.random()withcrypto.randomBytes()for confirmation token generation inInterceptor.ts.Arbitrator.tsbefore TTY display.ToolShieldexperience JSON incli.py._escape_braces()in.format()calls inexp_generate.py.InputSanitizer.ts: New utility for input sanitization.Interceptor.ts: Updates token generation and sanitizes error messages.Arbitrator.ts: Sanitizes TTY argument display.cli.py: Validates and sanitizes experience JSON.exp_generate.py: Fixes formatting issues.DecisionLog.ts: Sanitizes JSONL entries.tool-interceptor.ts: Sanitizes action summaries.IrreversibilityScorer.ts: Strips control characters from summaries.index.ts: Exports sanitization utilities.This description was created by
for 7970355. You can customize this summary. It will automatically update as commits are pushed.