LCORE-1831: Implement Redaction Safety Capability in Pydantic AI#1915
LCORE-1831: Implement Redaction Safety Capability in Pydantic AI#1915arin-deloatch wants to merge 8 commits into
Conversation
WalkthroughAdds a configurable PII redaction capability to the Pydantic AI Lightspeed stack. Introduces ChangesPII Redaction Capability Implementation
Sequence Diagram(s)sequenceDiagram
participant PydanticAI
participant PiiRedactionCapability
participant _redact_messages
participant _redact_response
participant redact_text
PydanticAI->>PiiRedactionCapability: before_model_request(ctx, request_context)
PiiRedactionCapability->>_redact_messages: messages, compiled_patterns
_redact_messages->>redact_text: each UserPromptPart content string
redact_text-->>_redact_messages: RedactionResult (content, redacted, count)
_redact_messages-->>PiiRedactionCapability: new message list (or original if unchanged)
PiiRedactionCapability-->>PydanticAI: updated request_context
PydanticAI->>PiiRedactionCapability: after_model_request(ctx, request_context, response)
PiiRedactionCapability->>_redact_response: ModelResponse, compiled_patterns
_redact_response->>redact_text: each TextPart.content string
redact_text-->>_redact_response: RedactionResult
_redact_response-->>PiiRedactionCapability: new ModelResponse (or original if unchanged)
PiiRedactionCapability-->>PydanticAI: ModelResponse
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Suggested reviewers
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
✨ Simplify code
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
anik120
left a comment
There was a problem hiding this comment.
I know this is not part of the scope of this PR, but is src/pydantic_ai... leaking implementation detail again @jrobertboos?
|
|
||
|
|
||
| def _redact_message_parts( | ||
| parts: Sequence[Any], compiled_patterns: CompiledPatterns |
There was a problem hiding this comment.
Try to avoid Any in the whole module where possible.
|
|
||
| def _redact_string_content( | ||
| text: str, compiled_patterns: CompiledPatterns | ||
| ) -> str | None: |
There was a problem hiding this comment.
Prefer using Optional in the whole module.
f5586a7 to
d5a97ab
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/pydantic_ai_lightspeed/capabilities/redaction/capability.py`:
- Line 1: Add module-level logging support by importing get_logger from log.py
and creating a logger instance at the top of the file after the module docstring
using logger = get_logger(__name__). This logger should then be used throughout
the capability module to audit PII redaction events, such as logging at debug
level when redaction rules match specific patterns and at info level for
redaction statistics and summaries. This approach aligns with coding guidelines
and provides valuable audit trails for the security-sensitive PII redaction
functionality.
- Line 5: Update all type annotations in the module to use modern pipe syntax
instead of Optional. Replace Optional[Type] with Type | None for all return type
annotations in the functions including _redact_text_content,
_redact_content_list, _redact_message_parts, and _redact_model_request. After
updating all function return types throughout the module, remove Optional from
the import statement at the top of the file since it will no longer be needed.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 7e09e4f4-9b9b-4dcf-a8b9-cf61a5e8e1e4
📒 Files selected for processing (10)
src/pydantic_ai_lightspeed/capabilities/__init__.pysrc/pydantic_ai_lightspeed/capabilities/redaction/__init__.pysrc/pydantic_ai_lightspeed/capabilities/redaction/capability.pysrc/pydantic_ai_lightspeed/capabilities/redaction/config.pysrc/pydantic_ai_lightspeed/capabilities/redaction/core.pytests/unit/pydantic_ai_lightspeed/capabilities/__init__.pytests/unit/pydantic_ai_lightspeed/capabilities/redaction/__init__.pytests/unit/pydantic_ai_lightspeed/capabilities/redaction/test_capability.pytests/unit/pydantic_ai_lightspeed/capabilities/redaction/test_config.pytests/unit/pydantic_ai_lightspeed/capabilities/redaction/test_core.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: E2E: server mode / ci / group 1
- GitHub Check: E2E: library mode / ci / group 2
- GitHub Check: E2E: library mode / ci / group 1
- GitHub Check: E2E: server mode / ci / group 2
- GitHub Check: E2E: server mode / ci / group 3
- GitHub Check: E2E: library mode / ci / group 3
🧰 Additional context used
📓 Path-based instructions (3)
tests/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Usepytest.mark.asynciomarker for async tests
Files:
tests/unit/pydantic_ai_lightspeed/capabilities/__init__.pytests/unit/pydantic_ai_lightspeed/capabilities/redaction/__init__.pytests/unit/pydantic_ai_lightspeed/capabilities/redaction/test_core.pytests/unit/pydantic_ai_lightspeed/capabilities/redaction/test_capability.pytests/unit/pydantic_ai_lightspeed/capabilities/redaction/test_config.py
src/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
src/**/*.py: Use absolute imports for internal modules:from authentication import get_auth_dependency
Llama Stack imports: Usefrom llama_stack_client import AsyncLlamaStackClient
Checkconstants.pyfor shared constants before defining new ones
All modules must start with descriptive docstrings explaining purpose
Uselogger = get_logger(__name__)fromlog.pyfor module logging
All functions must have complete type annotations for parameters and return types, use modern syntax (str | int), and include descriptive docstrings
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying function parameters
Useasync deffor I/O operations and external API calls
Use standard log levels with clear purposes:debug()for diagnostic info,info()for program execution,warning()for unexpected events,error()for serious problems
All classes must have descriptive docstrings explaining purpose and use PascalCase with standard suffixes:Configuration,Error/Exception,Resolver,Interface
Abstract classes must use ABC with@abstractmethoddecorators
Follow Google Python docstring conventions with required sections: Parameters, Returns, Raises, and Attributes for classes
Files:
src/pydantic_ai_lightspeed/capabilities/__init__.pysrc/pydantic_ai_lightspeed/capabilities/redaction/__init__.pysrc/pydantic_ai_lightspeed/capabilities/redaction/core.pysrc/pydantic_ai_lightspeed/capabilities/redaction/config.pysrc/pydantic_ai_lightspeed/capabilities/redaction/capability.py
src/**/__init__.py
📄 CodeRabbit inference engine (AGENTS.md)
Package
__init__.pyfiles must contain brief package descriptions
Files:
src/pydantic_ai_lightspeed/capabilities/__init__.pysrc/pydantic_ai_lightspeed/capabilities/redaction/__init__.py
🔇 Additional comments (6)
src/pydantic_ai_lightspeed/capabilities/redaction/config.py (1)
14-107: LGTM!tests/unit/pydantic_ai_lightspeed/capabilities/redaction/test_config.py (1)
1-157: LGTM!src/pydantic_ai_lightspeed/capabilities/redaction/capability.py (2)
31-258: LGTM!
261-325: LGTM!tests/unit/pydantic_ai_lightspeed/capabilities/redaction/test_capability.py (1)
1-316: LGTM!src/pydantic_ai_lightspeed/capabilities/redaction/__init__.py (1)
1-21: LGTM!
|
/ok-to-test |
asimurka
left a comment
There was a problem hiding this comment.
I personally like this decomposition to capability, core and config modules.
|
|
||
|
|
||
| @dataclass | ||
| class PiiRedactionCapability(AbstractCapability[Any]): |
There was a problem hiding this comment.
Use None for type argument as this corresponds to current implementation of lightspeed agent (with no dependencies). Use None as type annotation where possible.
|
/ok-to-test |
| @@ -0,0 +1,107 @@ | |||
| """Configuration models for PII redaction rules.""" | |||
There was a problem hiding this comment.
IMHO this config (or rather config classes/models) should be added into src/models/config.py. At least to avoid circular dependencies, to allow us to generate documentation etc.
There was a problem hiding this comment.
Understood, it was my initial understanding that we were to keep everything pydantic_ai related separate to avoid any sorts of conflicts with the core source code.
There was a problem hiding this comment.
However, I am more than happy to make the changes necessary! Just to be thorough, it is expected that RedactionRule,RedactionConfig and RedactionResult are being moved to src/models/config.py, correct?
There was a problem hiding this comment.
Correct. This will be later part of LCORE config so it will be technically implementation agnostic.
d5a97ab to
271c1f2
Compare
| @@ -0,0 +1,12 @@ | |||
| """Configuration models for PII redaction rules. | |||
There was a problem hiding this comment.
Remove this module entirely
| return list(self._compiled_patterns) | ||
|
|
||
|
|
||
| class RedactionResult(BaseModel): |
There was a problem hiding this comment.
This is not a configuration model so return it back to core.py
…capability. Use Optional for nullable returns and substitute Any with UserContent, ModelRequestPart, and ModelResponsePart for type-safe message handling.
271c1f2 to
64a9445
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
src/pydantic_ai_lightspeed/capabilities/redaction/capability.py (1)
5-5: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick winNormalize typing to repo conventions (
| None) and dependencyless capability generics.Line 5 and related return annotations still use
Optional[...], and Lines 260/279 keepAnyinAbstractCapability/RunContextfor a capability with no dependencies. Please align these annotations to keep type contracts consistent.As per coding guidelines, all functions must have complete type annotations for parameters and return types and use modern syntax (
str | int).Suggested refactor
-from typing import Any, Optional +from typing import Any @@ def _redact_string_content( text: str, compiled_patterns: CompiledPatterns -) -> Optional[str]: +) -> str | None: @@ def _redact_text_content( item: TextContent, compiled_patterns: CompiledPatterns -) -> Optional[TextContent]: +) -> TextContent | None: @@ def _redact_content_list( content: Sequence[UserContent], compiled_patterns: CompiledPatterns -) -> Optional[list[UserContent]]: +) -> list[UserContent] | None: @@ def _redact_message_parts( parts: Sequence[ModelRequestPart], compiled_patterns: CompiledPatterns -) -> Optional[list[ModelRequestPart]]: +) -> list[ModelRequestPart] | None: @@ def _redact_model_request( message: ModelRequest, compiled_patterns: CompiledPatterns -) -> Optional[ModelRequest]: +) -> ModelRequest | None: @@ -class PiiRedactionCapability(AbstractCapability[Any]): +class PiiRedactionCapability(AbstractCapability[None]): @@ - ctx: RunContext[Any], + ctx: RunContext[None], @@ - ctx: RunContext[Any], + ctx: RunContext[None],#!/bin/bash # Verify local typing consistency for capability generics and Optional usage. rg -nP 'AbstractCapability\[[^\]]+\]' src/pydantic_ai_lightspeed/capabilities -C2 rg -nP 'RunContext\[[^\]]+\]' src/pydantic_ai_lightspeed/capabilities -C2 rg -nP '\bOptional\[' src/pydantic_ai_lightspeed/capabilities/redaction/capability.py -C1Also applies to: 29-31, 47-50, 92-95, 145-148, 173-176, 259-261, 277-303
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/pydantic_ai_lightspeed/capabilities/redaction/capability.py` at line 5, Update all type annotations in the redaction capability module to use modern Python typing syntax and remove unnecessary Any generics. Replace all instances of Optional[...] with the pipe union syntax (X | None) throughout the file on lines 5, 29-31, 47-50, 92-95, 145-148, 173-176. Additionally, update the generic type parameters for AbstractCapability and RunContext on lines 259-261 and 277-303 to remove Any and use empty brackets or appropriate type parameters based on whether the capability has dependencies, ensuring all function signatures and class definitions maintain consistent type contracts without relying on Any for capabilities without dependencies.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/models/config.py`:
- Around line 2222-2235: The RedactionRule model allows runtime mutation of the
mutable rules list and case_sensitive field, but the compiled regex patterns in
_compiled_patterns are only generated once and never invalidated when these
source fields change. To fix this, make the rules field immutable by changing
its type from list[RedactionRule] to tuple[RedactionRule, ...] or adding
frozen=True to prevent accidental mutations, and add a model validator that
recompiles _compiled_patterns whenever either rules or case_sensitive are
modified, ensuring the compiled patterns remain synchronized with the current
configuration state.
---
Duplicate comments:
In `@src/pydantic_ai_lightspeed/capabilities/redaction/capability.py`:
- Line 5: Update all type annotations in the redaction capability module to use
modern Python typing syntax and remove unnecessary Any generics. Replace all
instances of Optional[...] with the pipe union syntax (X | None) throughout the
file on lines 5, 29-31, 47-50, 92-95, 145-148, 173-176. Additionally, update the
generic type parameters for AbstractCapability and RunContext on lines 259-261
and 277-303 to remove Any and use empty brackets or appropriate type parameters
based on whether the capability has dependencies, ensuring all function
signatures and class definitions maintain consistent type contracts without
relying on Any for capabilities without dependencies.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 6de4a328-dee3-40af-b542-60de7aa7e82d
📒 Files selected for processing (10)
src/models/config.pysrc/pydantic_ai_lightspeed/capabilities/__init__.pysrc/pydantic_ai_lightspeed/capabilities/redaction/__init__.pysrc/pydantic_ai_lightspeed/capabilities/redaction/capability.pysrc/pydantic_ai_lightspeed/capabilities/redaction/core.pytests/unit/pydantic_ai_lightspeed/capabilities/__init__.pytests/unit/pydantic_ai_lightspeed/capabilities/redaction/__init__.pytests/unit/pydantic_ai_lightspeed/capabilities/redaction/test_capability.pytests/unit/pydantic_ai_lightspeed/capabilities/redaction/test_config.pytests/unit/pydantic_ai_lightspeed/capabilities/redaction/test_core.py
📜 Review details
⏰ Context from checks skipped due to timeout. (7)
- GitHub Check: E2E Tests for Lightspeed Evaluation job
- GitHub Check: E2E: library mode / ci / group 1
- GitHub Check: E2E: server mode / ci / group 2
- GitHub Check: E2E: library mode / ci / group 3
- GitHub Check: E2E: library mode / ci / group 2
- GitHub Check: E2E: server mode / ci / group 3
- GitHub Check: E2E: server mode / ci / group 1
🧰 Additional context used
📓 Path-based instructions (4)
tests/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Usepytest.mark.asynciomarker for async tests
Files:
tests/unit/pydantic_ai_lightspeed/capabilities/__init__.pytests/unit/pydantic_ai_lightspeed/capabilities/redaction/__init__.pytests/unit/pydantic_ai_lightspeed/capabilities/redaction/test_core.pytests/unit/pydantic_ai_lightspeed/capabilities/redaction/test_config.pytests/unit/pydantic_ai_lightspeed/capabilities/redaction/test_capability.py
src/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
src/**/*.py: Use absolute imports for internal modules:from authentication import get_auth_dependency
Llama Stack imports: Usefrom llama_stack_client import AsyncLlamaStackClient
Checkconstants.pyfor shared constants before defining new ones
All modules must start with descriptive docstrings explaining purpose
Uselogger = get_logger(__name__)fromlog.pyfor module logging
All functions must have complete type annotations for parameters and return types, use modern syntax (str | int), and include descriptive docstrings
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying function parameters
Useasync deffor I/O operations and external API calls
Use standard log levels with clear purposes:debug()for diagnostic info,info()for program execution,warning()for unexpected events,error()for serious problems
All classes must have descriptive docstrings explaining purpose and use PascalCase with standard suffixes:Configuration,Error/Exception,Resolver,Interface
Abstract classes must use ABC with@abstractmethoddecorators
Follow Google Python docstring conventions with required sections: Parameters, Returns, Raises, and Attributes for classes
Files:
src/pydantic_ai_lightspeed/capabilities/redaction/__init__.pysrc/pydantic_ai_lightspeed/capabilities/__init__.pysrc/pydantic_ai_lightspeed/capabilities/redaction/core.pysrc/models/config.pysrc/pydantic_ai_lightspeed/capabilities/redaction/capability.py
src/**/__init__.py
📄 CodeRabbit inference engine (AGENTS.md)
Package
__init__.pyfiles must contain brief package descriptions
Files:
src/pydantic_ai_lightspeed/capabilities/redaction/__init__.pysrc/pydantic_ai_lightspeed/capabilities/__init__.py
src/models/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Pydantic models must use
@model_validatorand@field_validatorfor validation and complete type annotations for all attributes, avoidingAnytype
Files:
src/models/config.py
🧠 Learnings (2)
📚 Learning: 2026-01-12T10:58:40.230Z
Learnt from: blublinsky
Repo: lightspeed-core/lightspeed-stack PR: 972
File: src/models/config.py:459-513
Timestamp: 2026-01-12T10:58:40.230Z
Learning: In lightspeed-core/lightspeed-stack, for Python files under src/models, when a user claims a fix is done but the issue persists, verify the current code state before accepting the fix. Steps: review the diff, fetch the latest changes, run relevant tests, reproduce the issue, search the codebase for lingering references to the original problem, confirm the fix is applied and not undone by subsequent commits, and validate with local checks to ensure the issue is resolved.
Applied to files:
src/models/config.py
📚 Learning: 2026-02-25T07:46:33.545Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:33.545Z
Learning: In the Python codebase, requests.py should use OpenAIResponseInputTool as Tool while responses.py uses OpenAIResponseTool as Tool. This difference is intentional due to differing schemas for input vs output tools in llama-stack-api. Apply this distinction consistently to other models under src/models (e.g., ensure request-related tools use the InputTool variant and response-related tools use the ResponseTool variant). If adding new tools, choose the corresponding InputTool or Tool class based on whether the tool represents input or output, and document the rationale in code comments.
Applied to files:
src/models/config.py
🔇 Additional comments (9)
src/pydantic_ai_lightspeed/capabilities/__init__.py (1)
1-2: LGTM!src/pydantic_ai_lightspeed/capabilities/redaction/__init__.py (2)
15-21: LGTM!
3-13: No action needed. The import paths are correct and consistent with the codebase conventions.modelsis a sibling package at thesrc/level, sofrom models.config importis the correct absolute form per the coding guidelines. The difference in style (short absolute form for external sibling packages vs. fully qualified form for same-package imports) is consistent throughout the codebase and does not indicate an error.> Likely an incorrect or invalid review comment.tests/unit/pydantic_ai_lightspeed/capabilities/__init__.py (1)
1-2: LGTM!tests/unit/pydantic_ai_lightspeed/capabilities/redaction/__init__.py (1)
1-2: LGTM!tests/unit/pydantic_ai_lightspeed/capabilities/redaction/test_config.py (1)
1-158: LGTM!src/pydantic_ai_lightspeed/capabilities/redaction/core.py (1)
1-56: LGTM!tests/unit/pydantic_ai_lightspeed/capabilities/redaction/test_core.py (1)
1-154: LGTM!tests/unit/pydantic_ai_lightspeed/capabilities/redaction/test_capability.py (1)
1-317: LGTM!
| rules: list[RedactionRule] = Field( | ||
| default_factory=list, | ||
| title="Redaction rules", | ||
| description="Ordered list of PII redaction rules", | ||
| ) | ||
| case_sensitive: bool = Field( | ||
| False, | ||
| title="Case sensitive", | ||
| description=("When False, patterns are compiled with re.IGNORECASE"), | ||
| ) | ||
|
|
||
| _compiled_patterns: list[tuple[Pattern[str], str]] = PrivateAttr( | ||
| default_factory=list, | ||
| ) |
There was a problem hiding this comment.
Prevent stale compiled regex state after runtime config mutation.
Line 2222 keeps rules mutable, but Lines 2237-2264 compile patterns only once into _compiled_patterns. If rules/case_sensitive are changed later, redaction keeps using stale compiled entries and can miss newly intended PII masking.
Suggested hardening
class RedactionConfig(ConfigurationBase):
@@
- rules: list[RedactionRule] = Field(
- default_factory=list,
+ rules: tuple[RedactionRule, ...] = Field(
+ default_factory=tuple,
@@
- _compiled_patterns: list[tuple[Pattern[str], str]] = PrivateAttr(
- default_factory=list,
+ _compiled_patterns: tuple[tuple[Pattern[str], str], ...] = PrivateAttr(
+ default_factory=tuple,
@@
- compiled: list[tuple[Pattern[str], str]] = []
+ compiled: list[tuple[Pattern[str], str]] = []
@@
- self._compiled_patterns = compiled
+ self._compiled_patterns = tuple(compiled)
return selfAlso applies to: 2237-2264, 2267-2272
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/models/config.py` around lines 2222 - 2235, The RedactionRule model
allows runtime mutation of the mutable rules list and case_sensitive field, but
the compiled regex patterns in _compiled_patterns are only generated once and
never invalidated when these source fields change. To fix this, make the rules
field immutable by changing its type from list[RedactionRule] to
tuple[RedactionRule, ...] or adding frozen=True to prevent accidental mutations,
and add a model validator that recompiles _compiled_patterns whenever either
rules or case_sensitive are modified, ensuring the compiled patterns remain
synchronized with the current configuration state.
Description
Add a regex-based PII redaction capability for pydantic-ai agents. This introduces:
core.py):redact_text()function and immutableRedactionResultmodel forsequential regex-based text substitution
config.py):RedactionRuleandRedactionConfigPydantic models withcompile-time pattern validation and per-rule/global case sensitivity controls
capability.py):PiiRedactionCapabilityintegrating with pydantic-ai'sAbstractCapabilityto redact user prompts before model requests and model response text beforereturning to the caller
Type of change
Tools used to create PR
Identify any AI code assistants used in this PR (for transparency and review context)
Related Tickets & Documents
Checklist before requesting a review
Testing
uv run make format— passes, no reformatsuv run make verify— all linters pass (black, pylint 10/10, pyright 0 errors, ruff, pydocstyle,mypy, lint-openapi)
uv run pytest tests/unit/pydantic_ai_lightspeed/capabilities/ -v— 51/51 tests passcapability.py)Summary by CodeRabbit
New Features
Tests