Skip to content

feat(orchestration): add content filtering and prompt shield module#174

Open
lenin-ribeiro wants to merge 4 commits into
mainfrom
feat/orchestration-filtering
Open

feat(orchestration): add content filtering and prompt shield module#174
lenin-ribeiro wants to merge 4 commits into
mainfrom
feat/orchestration-filtering

Conversation

@lenin-ribeiro

Copy link
Copy Markdown

Description

Adds a new sap_cloud_sdk.orchestration module that activates Azure Content Safety filtering and prompt attack detection (prompt shield) automatically for all SAP AI Core model calls made through LiteLLM.

Filtering is enabled by default when set_aicore_config() is called — no code change is required by the developer. The default policy applies threshold 4 (block medium+ severity content) and prompt shield on input for all sap/* model calls.

How it works

set_aicore_config() now calls _activate_filtering() at the end, which patches litellm.GenAIHubOrchestrationConfig with a subclass (FilteringOrchestrationConfig) that:

  • Injects modules.filtering (Azure Content Safety config) into every v2 completion request body via transform_request
  • Detects filter rejections in responses and raises ContentFilteredError via transform_response
  • Unwraps filter rejections embedded in LiteLLM APIConnectionError exceptions via extract_filter_blocked()

LiteLLM still makes the HTTP call and Traceloop/OTel instrumentation is fully preserved.

Developer experience

Zero code change for the common case — existing agent code is unchanged:

from sap_cloud_sdk.aicore import set_aicore_config
set_aicore_config()
# ← filtering is now active at threshold 4/4/4/4 + prompt_shield=True

Thresholds configurable via env vars (set before set_aicore_config()):

ORCH_FILTER_SELF_HARM=0    # strict — block any detected self-harm content
ORCH_FILTER_ENABLED=false  # disable filtering entirely

Programmatic override at runtime:

from sap_cloud_sdk.orchestration import set_filtering
set_filtering(self_harm=0, violence=0)

Handling blocked requests:

from sap_cloud_sdk.orchestration import ContentFilteredError
from sap_cloud_sdk.orchestration._litellm_patch import extract_filter_blocked

try:
    result = await llm.ainvoke(messages)
except ContentFilteredError as e:
    return "Your request was blocked by content safety policy."
except Exception as e:
    if blocked := extract_filter_blocked(e):
        return "Your request was blocked by content safety policy."
    raise

Related Issue

N/A — new feature proposed and implemented by the App Foundation agent team.

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update
  • Code refactoring
  • Dependency update

Breaking Change Detail

set_aicore_config() now activates content filtering as a side effect. Agents upgrading to 0.28.0 will have filtering applied to their sap/* model calls.

What breaks: Any agent relying on unfiltered LLM output (e.g. testing with deliberately harmful prompts, or using a deployment without Azure Content Safety provisioned) will see different behaviour.

Migration path: Set ORCH_FILTER_ENABLED=false in the environment before calling set_aicore_config() to preserve previous unfiltered behaviour.

How to Test

Run the orchestration unit tests:

uv run pytest tests/orchestration/ -v
# Expected: 41 tests pass

Verify auto-activation:

import litellm
from sap_cloud_sdk.aicore import set_aicore_config
from sap_cloud_sdk.orchestration._litellm_patch import FilteringOrchestrationConfig

set_aicore_config()
assert litellm.GenAIHubOrchestrationConfig is FilteringOrchestrationConfig
print("Filtering activated ✓")

Verify wire format:

from sap_cloud_sdk.orchestration._models import FilteringModuleConfig

d = FilteringModuleConfig().to_dict()
assert "input" in d and "output" in d
assert d["input"]["filters"][0]["type"] == "azure_content_safety"
assert d["input"]["filters"][0]["config"]["prompt_shield"] is True
print("Wire format correct ✓")

Checklist

  • I have read the Contributing Guidelines
  • I have verified that my changes solve the issue
  • I have added/updated automated tests to cover my changes (41 unit tests)
  • All tests pass locally
  • I have verified that my code follows the Code Guidelines
  • I have updated documentation (README.md, aicore/user-guide.md, new orchestration/user-guide.md)
  • I have added type hints for all public APIs
  • My code does not contain sensitive information (credentials, tokens, etc.)
  • I have followed Conventional Commits for commit messages

Additional Notes

New files

File Purpose
src/sap_cloud_sdk/orchestration/__init__.py Public API: set_filtering(), ContentFilteredError, config classes
src/sap_cloud_sdk/orchestration/_models.py ContentFilterConfig, PromptShieldConfig, FilteringModuleConfig with from_env() and to_dict()
src/sap_cloud_sdk/orchestration/_litellm_patch.py FilteringOrchestrationConfig subclass, _install(), extract_filter_blocked()
src/sap_cloud_sdk/orchestration/exceptions.py ContentFilteredError (direction, details, request_id), OrchestrationError
src/sap_cloud_sdk/orchestration/user-guide.md Full developer guide with env vars, examples, migration notes
tests/orchestration/unit/test_models.py 19 serialisation + from_env() parsing tests
tests/orchestration/unit/test_patch.py 13 LiteLLM patch + response detection tests
tests/orchestration/unit/test_set_filtering.py 7 set_filtering() behaviour tests

Env vars reference

Variable Default Description
ORCH_FILTER_ENABLED true Set false to disable filtering entirely
ORCH_FILTER_DIRECTIONS input,output Which sides to filter
ORCH_FILTER_HATE 4 Azure severity 0/2/4/6
ORCH_FILTER_VIOLENCE 4 Azure severity 0/2/4/6
ORCH_FILTER_SEXUAL 4 Azure severity 0/2/4/6
ORCH_FILTER_SELF_HARM 4 Azure severity 0/2/4/6
ORCH_FILTER_PROMPT_SHIELD true Input-only jailbreak detection

Activates Azure Content Safety filtering and prompt attack detection
automatically for all SAP AI Core model calls. Filtering is enabled
by default when set_aicore_config() is called — no code change required
by the developer.

- New module sap_cloud_sdk.orchestration with:
  - FilteringModuleConfig: configures input/output filtering thresholds
    and prompt shield via ORCH_FILTER_* env vars (defaults: threshold 4,
    prompt_shield=True on input)
  - set_filtering(): programmatic override for thresholds at runtime
  - ContentFilteredError: raised when input or output is rejected by
    the content filter
  - extract_filter_blocked(): unwraps filter rejections embedded in
    LiteLLM APIConnectionError exceptions
- set_aicore_config() now calls _activate_filtering() at the end,
  applying FilteringModuleConfig.from_env() to LiteLLM's SAP provider
- Observability preserved: LiteLLM still makes the HTTP call;
  Traceloop/OTel instrumentation is unaffected
- 41 unit tests covering serialisation, env parsing, LiteLLM patch,
  response detection, and set_filtering() behaviour
- User guides updated in aicore/ and orchestration/; README breaking
  change notice added
- Version bump 0.27.1 → 0.28.0
@lenin-ribeiro lenin-ribeiro self-assigned this Jun 19, 2026
@lenin-ribeiro lenin-ribeiro requested a review from a team as a code owner June 19, 2026 14:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant