Skip to content

fix: correct NOT STARTS WITH projection for truncated partitions#3528

Open
GayathriSrividya wants to merge 1 commit into
apache:mainfrom
GayathriSrividya:fix/issue-3493-truncate-not-starts-with
Open

fix: correct NOT STARTS WITH projection for truncated partitions#3528
GayathriSrividya wants to merge 1 commit into
apache:mainfrom
GayathriSrividya:fix/issue-3493-truncate-not-starts-with

Conversation

@GayathriSrividya

Copy link
Copy Markdown
Contributor

closes #3493

Summary

Fixes incorrect projection of NOT STARTS WITH predicates for truncated string/binary partition fields. The current implementation unsafely truncates the filter literal without checking its length relative to the truncate width.

Root Cause

The TruncateTransform.project method calls _truncate_array which blindly truncates the literal for both STARTS WITH and NOT STARTS WITH predicates:

elif isinstance(pred, BoundNotStartsWith):
    return NotStartsWith(Reference(name), _transform_literal(transform, boundary))

For NOT STARTS WITH "hello" with truncate[2], this produces:

  • Current (unsafe): NOT STARTS WITH "he"
  • Problem: The truncated partition contains all values starting with "he" (from "hello", "heat", "hear", etc.), so we cannot safely exclude all non-"hello" rows

Solution

Add special handling for BoundNotStartsWith in the project method following the Java/Go reference behavior:

  • prefix_length < truncate_width: Keep original NOT STARTS WITH literal (safe)
  • prefix_length == truncate_width: Project to != instead (safe equality check)
  • prefix_length > truncate_width: Return None (no inclusive projection possible)

pyiceberg/transforms.py

  • Add explicit NOT STARTS WITH handling before calling _truncate_array
  • Check literal length vs truncate width and apply correct projection rules

tests/test_transforms.py

  • Update test_projection_truncate_string_not_starts_with to expect None (prefix_length > width is unsafe)
  • Add test_projection_truncate_string_not_starts_with_shorter_literal (prefix_length == width → !=)
  • Add test_projection_truncate_string_not_starts_with_original_literal (prefix_length < width → original)

Validation

  • make lint ✓ (all pre-commit hooks pass)
  • pytest tests/test_transforms.py → 280 passed ✓
  • All 13 string truncate projection tests pass

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Incorrect NOT STARTS WITH projection for truncated partitions

1 participant