Add prompt injection hardening across all data paths by joys0714 · Pull Request #42 · pegasi-ai/reins

joys0714 · 2026-02-26T21:47:56Z

Summary

Adds an InputSanitizer utility module with defense-in-depth functions for stripping ANSI escape codes, prompt injection markers, control characters, and markdown syntax
Fixes confirmation token generation to use crypto.randomBytes() instead of Math.random()
Sanitizes tool parameters before TTY display, audit logging, error messages to the LLM, and instruction files
Validates ToolShield experience JSON — rejects values containing prompt injection patterns, strips markdown syntax, sanitizes tool names from filenames
Fixes missing _escape_braces() in two .format() calls in exp_generate.py

Files changed (9)

src/core/InputSanitizer.ts — new sanitization utility
src/core/Interceptor.ts — crypto.randomBytes tokens + sanitize error messages
src/core/Arbitrator.ts — sanitize TTY args display
src/core/IrreversibilityScorer.ts — strip control chars from summary
src/plugin/tool-interceptor.ts — sanitize actionSummary assignments
src/storage/DecisionLog.ts — strip control chars from JSONL
src/core/toolshield/toolshield/cli.py — validate experiences + sanitize tool names
src/core/toolshield/toolshield/exp_generate.py — fix _escape_braces gaps
src/index.ts — export sanitization utilities

Test plan

npm run build passes
Craft tool params with ANSI escape codes → verify stripped in TTY display
Craft tool params with "IGNORE ALL PREVIOUS INSTRUCTIONS" → verify replaced with [FILTERED] in error messages
Run toolshield import --exp-file <existing>.json --agent openclaw → verify existing experiences import correctly
Create experience JSON with "## NEW INSTRUCTIONS:" → verify rejected by validator

🤖 Generated with Claude Code

Important

Enhances security by adding input sanitization to prevent prompt injection, updates token generation, and fixes formatting issues across multiple components.

Behavior:
- Introduces InputSanitizer.ts for stripping ANSI escape codes, prompt injection markers, control characters, and markdown syntax.
- Replaces Math.random() with crypto.randomBytes() for confirmation token generation in Interceptor.ts.
- Sanitizes tool parameters in Arbitrator.ts before TTY display.
- Validates and sanitizes ToolShield experience JSON in cli.py.
- Fixes missing _escape_braces() in .format() calls in exp_generate.py.
Files:
- InputSanitizer.ts: New utility for input sanitization.
- Interceptor.ts: Updates token generation and sanitizes error messages.
- Arbitrator.ts: Sanitizes TTY argument display.
- cli.py: Validates and sanitizes experience JSON.
- exp_generate.py: Fixes formatting issues.
- DecisionLog.ts: Sanitizes JSONL entries.
- tool-interceptor.ts: Sanitizes action summaries.
- IrreversibilityScorer.ts: Strips control characters from summaries.
- index.ts: Exports sanitization utilities.

^{This description was created by}^{for 7970355. You can customize this summary. It will automatically update as commits are pushed.}

Defense-in-depth sanitization layer for tool parameters, error messages, audit logs, TTY display, and instruction files. - New InputSanitizer utility: stripControlChars, sanitizeForPrompt, sanitizeToolParams, escapeMarkdown, truncateForDisplay - Fix confirmation tokens to use crypto.randomBytes instead of Math.random - Sanitize tool params in Arbitrator TTY display (prevents ANSI injection) - Sanitize IrreversibilityScorer summary (source of actionSummary) - Sanitize error messages sent back to the LLM in channel mode - Sanitize DecisionLog JSONL output (strip control chars) - Validate ToolShield experience JSON: type checking, injection pattern detection, markdown stripping, tool name sanitization - Fix missing _escape_braces in exp_generate.py .format() calls Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

ellipsis-dev

Important

Looks good to me! 👍

Reviewed everything up to 7970355 in 19 seconds. Click for details.

Reviewed 418 lines of code in 9 files
Skipped 0 files when reviewing.
Skipped posting 0 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

Workflow ID: wflow_paZJO0xSasCsVRdL

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

kw2828 · 2026-02-26T21:52:55Z

Thanks LGTM

nanookclaw

Solid hardening pass. A few observations from reading the diff:

The asymmetry between TS and Python is intentional and correct. TypeScript side replaces injection patterns with [FILTERED] (soft filter for display/logging), while Python side raises ValueError on detection (hard reject for experience import). Different threat models: display sanitization is best-effort, but data ingestion into instruction files should fail-closed. Worth documenting this distinction explicitly — a future contributor might try to "align" them and weaken the Python gate.

The injection pattern list is a good starting point but inherently incomplete. Regex-based detection is defense-in-depth, not a perimeter. The real protection here is the combination of sanitization + truncation + the existing human-in-the-loop approval from the Arbitrator. Worth adding a code comment in InputSanitizer.ts noting that this list should be treated as a living document — new patterns should be added as they're discovered in the wild rather than attempting exhaustive coverage upfront.

crypto.randomBytes() over Math.random() for confirmation tokens is the right call. One thing: the current implementation generates 3 bytes → 6 hex chars. That's 16^6 = ~16.7M possible tokens. For a confirmation token that lives for seconds during an interactive approval, this is plenty. But if ClawReins ever supports async/remote confirmation (where tokens persist longer), consider bumping to 4+ bytes.

The _escape_braces fixes in exp_generate.py are the most impactful change here. Those were real bugs — unescaped curly braces in user content passed to .format() would cause KeyError or silent data corruption. Nice catch.

Minor: truncateForDisplay handles high surrogate splitting but doesn't check for split inside a grapheme cluster (e.g., emoji with ZWJ). Probably fine for the display/logging context, but worth a comment noting this is character-safe, not grapheme-safe.

ellipsis-dev Bot reviewed Feb 26, 2026

View reviewed changes

nanookclaw reviewed Feb 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add prompt injection hardening across all data paths#42

Add prompt injection hardening across all data paths#42
joys0714 wants to merge 1 commit into
mainfrom
feat/prompt-injection-hardening

joys0714 commented Feb 26, 2026 •

edited by ellipsis-dev Bot

Loading

Uh oh!

ellipsis-dev Bot left a comment

Uh oh!

kw2828 commented Feb 26, 2026

Uh oh!

nanookclaw left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

joys0714 commented Feb 26, 2026 • edited by ellipsis-dev Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Files changed (9)

Test plan

Uh oh!

ellipsis-dev Bot left a comment

Choose a reason for hiding this comment

Uh oh!

kw2828 commented Feb 26, 2026

Uh oh!

nanookclaw left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

joys0714 commented Feb 26, 2026 •

edited by ellipsis-dev Bot

Loading