Skip to content

Add prompt injection hardening across all data paths#42

Open
joys0714 wants to merge 1 commit into
mainfrom
feat/prompt-injection-hardening
Open

Add prompt injection hardening across all data paths#42
joys0714 wants to merge 1 commit into
mainfrom
feat/prompt-injection-hardening

Conversation

@joys0714

@joys0714 joys0714 commented Feb 26, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Adds an InputSanitizer utility module with defense-in-depth functions for stripping ANSI escape codes, prompt injection markers, control characters, and markdown syntax
  • Fixes confirmation token generation to use crypto.randomBytes() instead of Math.random()
  • Sanitizes tool parameters before TTY display, audit logging, error messages to the LLM, and instruction files
  • Validates ToolShield experience JSON — rejects values containing prompt injection patterns, strips markdown syntax, sanitizes tool names from filenames
  • Fixes missing _escape_braces() in two .format() calls in exp_generate.py

Files changed (9)

  • src/core/InputSanitizer.tsnew sanitization utility
  • src/core/Interceptor.ts — crypto.randomBytes tokens + sanitize error messages
  • src/core/Arbitrator.ts — sanitize TTY args display
  • src/core/IrreversibilityScorer.ts — strip control chars from summary
  • src/plugin/tool-interceptor.ts — sanitize actionSummary assignments
  • src/storage/DecisionLog.ts — strip control chars from JSONL
  • src/core/toolshield/toolshield/cli.py — validate experiences + sanitize tool names
  • src/core/toolshield/toolshield/exp_generate.py — fix _escape_braces gaps
  • src/index.ts — export sanitization utilities

Test plan

  • npm run build passes
  • Craft tool params with ANSI escape codes → verify stripped in TTY display
  • Craft tool params with "IGNORE ALL PREVIOUS INSTRUCTIONS" → verify replaced with [FILTERED] in error messages
  • Run toolshield import --exp-file <existing>.json --agent openclaw → verify existing experiences import correctly
  • Create experience JSON with "## NEW INSTRUCTIONS:" → verify rejected by validator

🤖 Generated with Claude Code


Important

Enhances security by adding input sanitization to prevent prompt injection, updates token generation, and fixes formatting issues across multiple components.

  • Behavior:
    • Introduces InputSanitizer.ts for stripping ANSI escape codes, prompt injection markers, control characters, and markdown syntax.
    • Replaces Math.random() with crypto.randomBytes() for confirmation token generation in Interceptor.ts.
    • Sanitizes tool parameters in Arbitrator.ts before TTY display.
    • Validates and sanitizes ToolShield experience JSON in cli.py.
    • Fixes missing _escape_braces() in .format() calls in exp_generate.py.
  • Files:
    • InputSanitizer.ts: New utility for input sanitization.
    • Interceptor.ts: Updates token generation and sanitizes error messages.
    • Arbitrator.ts: Sanitizes TTY argument display.
    • cli.py: Validates and sanitizes experience JSON.
    • exp_generate.py: Fixes formatting issues.
    • DecisionLog.ts: Sanitizes JSONL entries.
    • tool-interceptor.ts: Sanitizes action summaries.
    • IrreversibilityScorer.ts: Strips control characters from summaries.
    • index.ts: Exports sanitization utilities.

This description was created by Ellipsis for 7970355. You can customize this summary. It will automatically update as commits are pushed.

Defense-in-depth sanitization layer for tool parameters, error messages,
audit logs, TTY display, and instruction files.

- New InputSanitizer utility: stripControlChars, sanitizeForPrompt,
  sanitizeToolParams, escapeMarkdown, truncateForDisplay
- Fix confirmation tokens to use crypto.randomBytes instead of Math.random
- Sanitize tool params in Arbitrator TTY display (prevents ANSI injection)
- Sanitize IrreversibilityScorer summary (source of actionSummary)
- Sanitize error messages sent back to the LLM in channel mode
- Sanitize DecisionLog JSONL output (strip control chars)
- Validate ToolShield experience JSON: type checking, injection pattern
  detection, markdown stripping, tool name sanitization
- Fix missing _escape_braces in exp_generate.py .format() calls

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

@ellipsis-dev ellipsis-dev Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Looks good to me! 👍

Reviewed everything up to 7970355 in 19 seconds. Click for details.
  • Reviewed 418 lines of code in 9 files
  • Skipped 0 files when reviewing.
  • Skipped posting 0 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

Workflow ID: wflow_paZJO0xSasCsVRdL

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

@kw2828

kw2828 commented Feb 26, 2026

Copy link
Copy Markdown
Collaborator

Thanks LGTM

@nanookclaw nanookclaw left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid hardening pass. A few observations from reading the diff:

The asymmetry between TS and Python is intentional and correct. TypeScript side replaces injection patterns with [FILTERED] (soft filter for display/logging), while Python side raises ValueError on detection (hard reject for experience import). Different threat models: display sanitization is best-effort, but data ingestion into instruction files should fail-closed. Worth documenting this distinction explicitly — a future contributor might try to "align" them and weaken the Python gate.

The injection pattern list is a good starting point but inherently incomplete. Regex-based detection is defense-in-depth, not a perimeter. The real protection here is the combination of sanitization + truncation + the existing human-in-the-loop approval from the Arbitrator. Worth adding a code comment in InputSanitizer.ts noting that this list should be treated as a living document — new patterns should be added as they're discovered in the wild rather than attempting exhaustive coverage upfront.

crypto.randomBytes() over Math.random() for confirmation tokens is the right call. One thing: the current implementation generates 3 bytes → 6 hex chars. That's 16^6 = ~16.7M possible tokens. For a confirmation token that lives for seconds during an interactive approval, this is plenty. But if ClawReins ever supports async/remote confirmation (where tokens persist longer), consider bumping to 4+ bytes.

The _escape_braces fixes in exp_generate.py are the most impactful change here. Those were real bugs — unescaped curly braces in user content passed to .format() would cause KeyError or silent data corruption. Nice catch.

Minor: truncateForDisplay handles high surrogate splitting but doesn't check for split inside a grapheme cluster (e.g., emoji with ZWJ). Probably fine for the display/logging context, but worth a comment noting this is character-safe, not grapheme-safe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants