Skip to content

feat(logs): redact PII from workflow logs via configurable rules#5136

Open
TheodoreSpeaks wants to merge 4 commits into
stagingfrom
feat/redact-pii-workflow-log
Open

feat(logs): redact PII from workflow logs via configurable rules#5136
TheodoreSpeaks wants to merge 4 commits into
stagingfrom
feat/redact-pii-workflow-log

Conversation

@TheodoreSpeaks

Copy link
Copy Markdown
Collaborator

Summary

  • Adds enterprise PII redaction for workflow execution logs, configured under Data Retention as org-scoped rules — each rule selects which PII entity types to mask and which workspaces it applies to (empty workspaces = all; empty entity types = redact nothing).
  • Reuses the guardrails Microsoft Presidio engine in mask mode, applied at the single log-persist choke point (completeWorkflowExecution), covering both the inline and externalized write paths.
  • Redaction is byte-budgeted/chunked (one batched subprocess per persist, not per block), structure-preserving (masks string leaves, rebuilds via structuredClone), and fail-safe (scrubs rather than leaks on error).
  • Adds a check-digit-validated VIN recognizer so it won't false-match arbitrary 17-char codes.
  • Adds per-workspace data-retention-hours overrides (resolved workspace ?? org ?? plan default) via a new nullable workspace.data_retention_settings column.
  • UI mirrors Access Control: a rules list + ChipModal (grouped checkbox grid for entity types, workspace multiselect), persisting on save with an unsaved-changes guard.

Type of Change

  • New feature

Testing

  • Unit tests for the redaction transform (recursion, batched substitution, structure preservation, fail-safe scrub, oversized-string skip) — passing.
  • Verified Presidio masking end-to-end locally (email/phone/credit-card/VIN), incl. VIN check-digit precision (valid VINs masked, random 17-char tokens left alone).
  • bun run lint, bun run check:api-validation:strict, and bun run check:migrations origin/staging all pass. Migration is a single additive nullable column (expand-phase, backward-compatible).

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

Enterprise PII redaction for workflow execution logs, configured under
Data Retention as org-scoped rules (each rule picks entity types + which
workspaces it applies to). Reuses the guardrails Presidio engine in mask
mode at the log-persist choke point, with a check-digit-validated VIN
recognizer. Also adds per-workspace data-retention-hours overrides.
@vercel

vercel Bot commented Jun 19, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
docs Skipped Skipped Jun 19, 2026 3:05am

Request Review

@cursor

cursor Bot commented Jun 19, 2026

Copy link
Copy Markdown

PR Summary

High Risk
Changes how execution logs are stored (PII masking) and how retention is resolved for cleanup; failures scrub content but Presidio/timeouts could affect log completion latency or content usability.

Overview
Adds enterprise PII redaction for workflow execution logs: org-level rules (Presidio entity types + workspace scope) are stored on piiRedaction, exposed via org data-retention GET/PUT, and applied at completeWorkflowExecution before inline or externalized storage. Masking is batched (maskPIIBatch / Python mask_batch), structure-preserving, and fail-safe ([REDACTION_FAILED] on errors or size limits).

Extends data retention with per-workspace hour overrides (workspace.data_retention_settings migration), a new /api/workspaces/[id]/data-retention API, shared resolveEffectiveRetentionHours, and cleanup dispatch that resolves workspace ?? org ?? plan instead of org-only.

Data Retention UI gains a PII rules section (list + modal with searchable entity grid and workspace targeting); retention hours and rules save separately. Guardrails gain a client-safe pii-entities catalog, a check-digit VIN recognizer, and refactored Python/TS subprocess helpers for batch masking.

Reviewed by Cursor Bugbot for commit 1635ce1. Bugbot is set up for automated code reviews on this repo. Configure here.

@TheodoreSpeaks

Copy link
Copy Markdown
Collaborator Author

@greptile review

Comment thread apps/sim/lib/logs/execution/logger.ts
Comment thread apps/sim/lib/logs/execution/pii-redaction.ts
…rkflow-log

# Conflicts:
#	packages/db/migrations/meta/0241_snapshot.json
#	packages/db/migrations/meta/_journal.json
#	scripts/check-api-validation-contracts.ts
): number | null {
const { workspaceSettings, orgSettings, key, fallback } = params
return workspaceSettings?.[key] ?? orgSettings?.[key] ?? fallback
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Null retention treated as unset

High Severity

resolveEffectiveRetentionHours chains values with ??, so an explicit null retention hour (Forever in the UI) is skipped and replaced by the org value or enterprise plan fallback. Enterprise cleanup then schedules deletion when it previously skipped, and workspace overrides to Forever never win over org numeric defaults.

Additional Locations (2)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit d9a4dd4. Configure here.

// Settings are only writable by enterprise orgs, but re-verify at read time
// (e.g. a plan downgrade) before doing the work.
if (!(await isWorkspaceOnEnterprisePlan(workspaceId))) return payload

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enterprise check skips active redaction

Medium Severity

When org PII redaction rules are active, applyPiiRedaction still returns the raw payload if isWorkspaceOnEnterprisePlan is false. That helper returns false on lookup errors, so a transient billing/workspace failure bypasses masking and can persist unredacted PII despite enabled rules.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit d9a4dd4. Configure here.

@greptile-apps

greptile-apps Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds enterprise PII redaction for workflow execution logs, applying Microsoft Presidio in mask mode at the completeWorkflowExecution persist choke point, plus per-workspace retention-hours overrides and a settings UI mirroring the existing Access Control panel.

  • PII redaction engine: two-pass (collect → batch-mask) transform in pii-redaction.ts with a fail-safe scrub on Presidio failure; the Python mask_batch endpoint reuses one spaCy engine pair per subprocess invocation and adds a check-digit-validated VIN recognizer.
  • Retention resolution: resolveEffectivePiiRedaction unions entity types across applicable org-level rules per workspace; resolveEffectiveRetentionHours applies a workspace ?? org ?? plan cascade for hours overrides.
  • Schema & migration: additive nullable json column workspace.data_retention_settings in a single expand-phase migration; consistent with the existing organization.data_retention_settings type.

Confidence Score: 4/5

Safe to merge with one fix: applyPiiRedaction needs a try-catch so a transient DB failure during PII settings lookup cannot abort log persistence.

applyPiiRedaction performs a db.select() with no error handling. A transient DB failure there propagates through completeWorkflowExecution — which has no local catch around the call — and would prevent the entire execution log from being written. The rest of the change (transform logic, fail-safe scrub in redactPIIFromExecution, migration, UI) is well-structured.

apps/sim/lib/logs/execution/logger.ts — the applyPiiRedaction method needs defensive error handling around its DB query.

Important Files Changed

Filename Overview
apps/sim/lib/logs/execution/logger.ts Adds applyPiiRedaction on the hot completion path; missing try-catch around the DB query means a transient DB failure can cause log persistence to fail entirely.
apps/sim/lib/logs/execution/pii-redaction.ts New two-pass (collect + substitute) PII redaction transform with fail-safe scrub on Presidio error; oversized-string handling and marker reuse have been flagged in prior review threads.
apps/sim/lib/billing/retention.ts Introduces resolveEffectivePiiRedaction and resolveEffectiveRetentionHours; semantics of entityTypes: [] discussed in prior threads are correctly documented inline here (empty = contributes nothing).
apps/sim/lib/guardrails/validate_pii.py Adds check-digit-validated VIN recognizer and batch mask endpoint; ISO 3779 algorithm is correctly implemented with zero-weighted check-digit position.
apps/sim/lib/guardrails/validate_pii.ts Extracts runPythonScript helper and adds maskPIIBatch; executePythonPIIDetection is not refactored to use the new helper, leaving duplicated subprocess-spawn boilerplate.
apps/sim/app/api/workspaces/[id]/data-retention/route.ts New per-workspace retention-hours override endpoint; correctly gates writes behind enterprise check and correctly excludes PII rules (org-only concern) from workspace settings.
apps/sim/ee/data-retention/components/data-retention-settings.tsx Adds PII rules UI with unsaved-changes guard; hours and rules are saved independently, which correctly avoids unintentional field clearing.
packages/db/migrations/0243_data_retention_pii_redaction_and_workspace_override.sql Single additive nullable json column on workspace, consistent with the existing organization.data_retention_settings type; backward-compatible expand-phase migration.
apps/sim/lib/logs/execution/pii-redaction.test.ts Unit tests cover recursion, batched substitution, structure preservation, fail-safe scrub, and oversized-string skip; applyPiiRedaction (DB path) is not tested.

Reviews (3): Last reviewed commit: "fix(logs): widen PII entity visibleValue..." | Re-trigger Greptile

Comment on lines +53 to +66
rule?.appliesToAllWorkspaces === true ||
(Array.isArray(rule?.workspaceIds) && rule.workspaceIds.includes(params.workspaceId))
)

const union = new Set<string>()
for (const rule of applicable) {
if (!Array.isArray(rule.entityTypes)) continue
for (const t of rule.entityTypes) {
if (typeof t === 'string') union.add(t)
}
}
if (union.size === 0) return DEFAULT_PII_REDACTION
return { enabled: true, entityTypes: [...union] }
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 entityTypes: [] silently means "redact nothing", contradicting the schema JSDoc

resolveEffectivePiiRedaction contributes nothing to the union when a rule has entityTypes: [], so if all applicable rules have empty entity-type lists the union stays empty and the function returns enabled: false — no redaction. However, every site that documents entityTypes carries the JSDoc "Empty = redact all detected PII" (schema.ts PiiRedactionRule, primitives.ts piiRedactionRuleSchema, pii-redaction.ts PiiRedactionOptions). An enterprise operator who creates a catch-all rule with no entity types selected, expecting Presidio to mask every recognised PII type, will instead get zero redaction — a silent privacy leak. The Zod schema accepts entityTypes: [] without error, so the bug is reachable via the API. Either the resolution logic should treat an empty entity-type list as "mask all" (pass [] through, which Presidio interprets as all types), or the JSDoc needs to be corrected to "Empty = skip rule (redact nothing)".

Comment on lines +38 to +44
return value.length > 0 && Buffer.byteLength(value, 'utf8') <= PII_MAX_STRING_BYTES
}

/**
* Rebuild `value` replacing every eligible string leaf with `handle(leaf)`.
* Used for both collection (handle records and returns the input) and
* substitution (handle returns the masked value), so traversal order and

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Oversized strings are never replaced with REDACTION_FAILED_MARKER on the scrub path

When the total payload exceeds PII_MAX_TOTAL_BYTES, all eligible strings are replaced with REDACTION_FAILED_MARKER. But strings larger than PII_MAX_STRING_BYTES (128 KB) are excluded from collected entirely, so they pass through the substitution walk unchanged — even in the scrub/fail-safe branch. A block that emits a 200 KB string containing an email address or SSN will never be masked, regardless of whether the overall ceiling is hit or Presidio throws. The comment acknowledges this ("almost always base64 blobs") but does not log a warning when an oversized string is skipped, making it invisible to operators.

Comment on lines 592 to 626
}
}

/**
* Mask PII from log content before persistence when the execution's workspace
* (via workspace override or org default) has enterprise PII redaction enabled.
* Resolved at persist time so both the inline and externalized write paths are
* covered. Returns the payload unchanged when disabled or non-enterprise.
*/
private async applyPiiRedaction(
workspaceId: string | null,
payload: RedactablePayload
): Promise<RedactablePayload> {
if (!workspaceId) return payload

const [row] = await db
.select({ orgSettings: organization.dataRetentionSettings })
.from(workspace)
.leftJoin(organization, eq(organization.id, workspace.organizationId))
.where(eq(workspace.id, workspaceId))
.limit(1)
if (!row) return payload

const config = resolveEffectivePiiRedaction({ orgSettings: row.orgSettings, workspaceId })
if (!config.enabled) return payload

// Settings are only writable by enterprise orgs, but re-verify at read time
// (e.g. a plan downgrade) before doing the work.
if (!(await isWorkspaceOnEnterprisePlan(workspaceId))) return payload

return redactPIIFromExecution(payload, { entityTypes: config.entityTypes })
}

async completeWorkflowExecution(params: {
executionId: string

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Unconditional DB join on every workflow completion

applyPiiRedaction always fires a workspace → LEFT JOIN organization query, plus a second isWorkspaceOnEnterprisePlan call when redaction is configured, regardless of whether the org has any PII rules or is even on an enterprise plan. For deployments where the majority of executions are personal/non-enterprise workspaces this adds two round-trips to the hot path of every workflow. A lightweight guard (e.g. caching the org's enterprise status or checking a feature-flag ahead of the query) would let the common case skip both calls entirely.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

@greptile-apps

greptile-apps Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds enterprise-grade PII redaction for workflow execution logs, a new per-workspace data-retention-hours override, and a check-digit-validated VIN recognizer. The core redaction pipeline is well-engineered: a deterministic two-pass collect/substitute approach batches all strings from a single execution into one chunked Presidio call, and a fail-safe ensures PII is never persisted when masking fails.

  • PII redaction: hooked at completeWorkflowExecution after filterForDisplay and redactApiKeys; org-scoped rules are resolved at persist time covering both inline and externalized write paths; oversized payloads (>16 MB) are scrubbed rather than leaked.
  • Workspace retention overrides: new nullable workspace.data_retention_settings column with a clean workspace ?? org ?? plan-default resolution chain; a new /api/workspaces/[id]/data-retention endpoint mirrors the org-level API with proper admin + enterprise gating.
  • Three copies of the comment \"Empty = redact all detected PII\" on entityTypes (in schema.ts, primitives.ts, and pii-redaction.ts) directly contradict retention.ts, which treats an empty entity-type list as "contribute nothing, rule is inactive." The PR description confirms the correct semantic is "empty = redact nothing", so those three comments need updating to avoid operators mistakenly believing empty rules provide full coverage.

Confidence Score: 3/5

The redaction pipeline and workspace-override logic are structurally sound, but three copies of a wrong comment on a security-sensitive field create a genuine risk of operators deploying empty-entity-type rules under the false impression that they cover all PII.

The core machinery — two-pass collect/substitute, byte-chunked Presidio calls, fail-safe scrubbing, and the workspace ?? org ?? plan-default retention resolution — is implemented correctly. Three conflicting entityTypes comments in schema.ts, primitives.ts, and pii-redaction.ts say the opposite of what retention.ts implements. An enterprise admin who follows those comments and creates a rule with no entity types selected will have no PII masked at all — a silent failure in a privacy-protection feature.

packages/db/schema.ts, apps/sim/lib/api/contracts/primitives.ts, and apps/sim/lib/logs/execution/pii-redaction.ts all carry the wrong entityTypes comment that contradicts retention.ts.

Important Files Changed

Filename Overview
apps/sim/lib/logs/execution/pii-redaction.ts New PII redaction transform: deterministic two-pass collect/substitute approach is sound; fail-safe scrubbing on error is correct; misleading entityTypes comment and reuse of REDACTION_FAILED_MARKER for the byte-ceiling case need fixing.
apps/sim/lib/logs/execution/logger.ts PII redaction correctly hooked at the single persist choke point after filterForDisplay + redactApiKeys; adds 2 extra DB queries per execution on the hot path that could be consolidated into the existing log fetch.
apps/sim/lib/billing/retention.ts New file with clean resolution logic for both retention hours (workspace ?? org ?? fallback) and PII redaction (union of applicable rule entity types). Correctly returns disabled when union is empty, though this contradicts comments elsewhere.
packages/db/schema.ts Adds DataRetentionSettings interface and workspace.dataRetentionSettings column; PiiRedactionRule.entityTypes comment says Empty = redact all detected PII which contradicts the actual retention.ts behavior (empty = redact nothing).
apps/sim/lib/api/contracts/primitives.ts Adds well-validated piiRedactionRuleSchema and piiRedactionSettingsSchema; same misleading entityTypes comment as schema.ts.
apps/sim/lib/guardrails/validate_pii.ts New maskPIIBatch function correctly chunks texts by byte budget (256KB), rejects on subprocess failure so callers can apply fail-safe; existing validatePII single-text path unchanged.
apps/sim/lib/guardrails/validate_pii.py Adds mask_batch function reusing a single AnalyzerEngine + AnonymizerEngine across all texts in a chunk; VinRecognizer with ISO 3779 check-digit validation is well-implemented.
apps/sim/lib/billing/cleanup-dispatcher.ts Adds workspace-level settings to the cleanup scope query; fallback is null for all enterprise jobs so behavior is identical to the previous code for unconfigured orgs.
apps/sim/app/api/workspaces/[id]/data-retention/route.ts New workspace-level data retention GET/PUT endpoints; auth gating (session, admin permission, enterprise plan) is correct.
apps/sim/lib/logs/execution/pii-redaction.test.ts Good coverage: structure preservation, immutability, fail-safe scrub on error, oversized-string skip, and no-op when no strings collected.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant E as Executor
    participant L as ExecutionLogger
    participant R as retention.ts
    participant P as pii-redaction.ts
    participant V as validate_pii.ts
    participant Py as validate_pii.py
    participant DB as Postgres

    E->>L: completeWorkflowExecution
    L->>L: filterForDisplay + redactApiKeys
    L->>DB: SELECT org.dataRetentionSettings
    DB-->>L: orgSettings
    L->>R: resolveEffectivePiiRedaction
    R-->>L: enabled + entityTypes
    alt PII enabled
        L->>DB: isWorkspaceOnEnterprisePlan
        DB-->>L: true
        L->>P: redactPIIFromExecution
        P->>P: collect string leaves
        loop Per 256KB chunk
            P->>V: maskPIIBatch
            V->>Py: spawn subprocess
            Py-->>V: masked strings
            V-->>P: masked strings
        end
        P-->>L: redacted payload
    end
    L->>DB: UPDATE workflowExecutionLogs
    L-->>E: WorkflowExecutionLog
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant E as Executor
    participant L as ExecutionLogger
    participant R as retention.ts
    participant P as pii-redaction.ts
    participant V as validate_pii.ts
    participant Py as validate_pii.py
    participant DB as Postgres

    E->>L: completeWorkflowExecution
    L->>L: filterForDisplay + redactApiKeys
    L->>DB: SELECT org.dataRetentionSettings
    DB-->>L: orgSettings
    L->>R: resolveEffectivePiiRedaction
    R-->>L: enabled + entityTypes
    alt PII enabled
        L->>DB: isWorkspaceOnEnterprisePlan
        DB-->>L: true
        L->>P: redactPIIFromExecution
        P->>P: collect string leaves
        loop Per 256KB chunk
            P->>V: maskPIIBatch
            V->>Py: spawn subprocess
            Py-->>V: masked strings
            V-->>P: masked strings
        end
        P-->>L: redacted payload
    end
    L->>DB: UPDATE workflowExecutionLogs
    L-->>E: WorkflowExecutionLog
Loading

Reviews (2): Last reviewed commit: "Merge remote-tracking branch 'origin/sta..." | Re-trigger Greptile

Comment on lines +22 to +26
export interface PiiRedactionOptions {
/** Presidio entity types to mask. Empty = redact all detected PII. */
entityTypes: string[]
language?: string
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Contradictory entityTypes: [] semantics across the codebase

This comment says "Empty = redact all detected PII", but resolveEffectivePiiRedaction in retention.ts explicitly documents and implements the opposite: an empty union returns DEFAULT_PII_REDACTION (disabled), so empty entityTypes means redact nothing. The same misleading comment also appears on PiiRedactionRule.entityTypes in schema.ts and primitives.ts.

In practice, if an admin creates a rule with no entity types selected (believing the schema comment), resolveEffectivePiiRedaction returns enabled: false and applyPiiRedaction returns the payload unchanged — no PII is masked. The inconsistency is security-relevant: an operator who reads the schema or this interface and acts on it would mistakenly believe their logs are protected.

Comment on lines +7 to +8
/** Replaces text we could not safely mask, so PII is never persisted on failure. */
export const REDACTION_FAILED_MARKER = '[REDACTION_FAILED]'

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 The REDACTION_FAILED_MARKER label is reused for a deliberate performance-safety scrub (payload exceeds the 16 MB ceiling), which is not a failure of the Presidio service. An operator who sees [REDACTION_FAILED] in a log will reasonably assume the masking subprocess crashed, triggering unnecessary incident investigation. A distinct constant makes the two cases distinguishable without any logic change.

Suggested change
/** Replaces text we could not safely mask, so PII is never persisted on failure. */
export const REDACTION_FAILED_MARKER = '[REDACTION_FAILED]'
/** Replaces text we could not safely mask, so PII is never persisted on failure. */
export const REDACTION_FAILED_MARKER = '[REDACTION_FAILED]'
/**
* Replaces text when the payload exceeds {@link PII_MAX_TOTAL_BYTES} and NER
* was deliberately skipped. Distinct from {@link REDACTION_FAILED_MARKER} so
* operators can tell a size-based scrub from a Presidio service failure.
*/
export const REDACTION_OVERSIZED_MARKER = '[REDACTED_OVERSIZED]'

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Comment on lines +601 to +623
private async applyPiiRedaction(
workspaceId: string | null,
payload: RedactablePayload
): Promise<RedactablePayload> {
if (!workspaceId) return payload

const [row] = await db
.select({ orgSettings: organization.dataRetentionSettings })
.from(workspace)
.leftJoin(organization, eq(organization.id, workspace.organizationId))
.where(eq(workspace.id, workspaceId))
.limit(1)
if (!row) return payload

const config = resolveEffectivePiiRedaction({ orgSettings: row.orgSettings, workspaceId })
if (!config.enabled) return payload

// Settings are only writable by enterprise orgs, but re-verify at read time
// (e.g. a plan downgrade) before doing the work.
if (!(await isWorkspaceOnEnterprisePlan(workspaceId))) return payload

return redactPIIFromExecution(payload, { entityTypes: config.entityTypes })
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Two extra DB round-trips per execution on the hot completion path

applyPiiRedaction always issues a JOIN query for org settings, then — if rules are active — calls isWorkspaceOnEnterprisePlan (another query). For the vast majority of executions (no PII rules configured, or non-enterprise workspace), this is wasted work on the critical log-persist path. The existing existingLog query at the top of completeWorkflowExecution already loads the workspace row; adding organization.dataRetentionSettings to that initial SELECT would remove the need for this second query entirely.

Comment thread packages/db/schema.ts
Comment on lines +1075 to +1076
/** Presidio entity types to mask. Empty = redact all detected PII. */
entityTypes: string[]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 This comment contradicts how resolveEffectivePiiRedaction in retention.ts actually behaves. A rule with an empty entityTypes array contributes nothing to the applicable union, which makes the effective policy enabled: false — meaning nothing is redacted. The intended semantics (from retention.ts's own doc and the PR description) are "empty = redact nothing (rule contributes no types)".

Suggested change
/** Presidio entity types to mask. Empty = redact all detected PII. */
entityTypes: string[]
/** Presidio entity types to mask. Empty array = rule contributes no types (redact nothing). */
entityTypes: string[]

Comment on lines +92 to +93
/** Presidio entity types to mask. Empty = redact all detected PII. */
entityTypes: z.array(z.string().min(1, 'Entity type cannot be empty')).max(100),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Same misleading comment as on PiiRedactionRule in schema.ts. resolveEffectivePiiRedaction treats an empty entityTypes array on a rule as contributing nothing to the union, resulting in enabled: false (no redaction). The comment should reflect that.

Suggested change
/** Presidio entity types to mask. Empty = redact all detected PII. */
entityTypes: z.array(z.string().min(1, 'Entity type cannot be empty')).max(100),
/** Presidio entity types to mask. Empty array = rule contributes no types (redact nothing). */
entityTypes: z.array(z.string().min(1, 'Entity type cannot be empty')).max(100),

@TheodoreSpeaks

Copy link
Copy Markdown
Collaborator Author

@greptile review

Comment on lines +601 to +623
private async applyPiiRedaction(
workspaceId: string | null,
payload: RedactablePayload
): Promise<RedactablePayload> {
if (!workspaceId) return payload

const [row] = await db
.select({ orgSettings: organization.dataRetentionSettings })
.from(workspace)
.leftJoin(organization, eq(organization.id, workspace.organizationId))
.where(eq(workspace.id, workspaceId))
.limit(1)
if (!row) return payload

const config = resolveEffectivePiiRedaction({ orgSettings: row.orgSettings, workspaceId })
if (!config.enabled) return payload

// Settings are only writable by enterprise orgs, but re-verify at read time
// (e.g. a plan downgrade) before doing the work.
if (!(await isWorkspaceOnEnterprisePlan(workspaceId))) return payload

return redactPIIFromExecution(payload, { entityTypes: config.entityTypes })
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 applyPiiRedaction DB failure propagates to log persistence

The db.select() inside applyPiiRedaction has no error handling. A transient DB error (connection timeout, pool exhaustion) while looking up org PII settings will throw from this function, propagate through completeWorkflowExecution — which has no local catch around the applyPiiRedaction call — and cause the entire workflow execution log to fail to persist. The PR's own fail-safe principle ("scrubs rather than leaks on error") in redactPIIFromExecution is not applied here to the settings-lookup path. A try/catch around the entire method body returning payload on any error would prevent PII configuration lookup failures from destroying execution history.

…t lazy

- Extend PII redaction to span error/errorMessage/toolCalls and top-level
  error/completionFailure/trigger/executionState (Bugbot: PII in execution
  metadata). executionState is safe to redact — resume reads from the separate
  pausedExecutions table, not the log copy.
- Lazy-import validate_pii in pii-redaction so the Python/child_process
  guardrails module stays out of the static middleware/RSC graph.
- Type the org retention mutation to the contract body (optional, non-null).

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 3 total unresolved issues (including 2 from previous reviews).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 1635ce1. Configure here.

})
}

if (collected.length === 0) return payload

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oversized-only logs skip redaction

High Severity

When PII redaction is active, redactPIIFromExecution returns the original payload unchanged if every string leaf is skipped (e.g. over 128KB). Those fields are never masked or scrubbed, so sensitive text in large log values can still be persisted despite an enabled policy.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 1635ce1. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant