Bug Fix: Cast UINT32 to INT32 to ensure compatibility with other engines #3529
Open
JeroenSchmidt wants to merge 1 commit into
Open
Bug Fix: Cast UINT32 to INT32 to ensure compatibility with other engines #3529JeroenSchmidt wants to merge 1 commit into
JeroenSchmidt wants to merge 1 commit into
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Rationale for this change
ArrowProjectionVisitor._cast_if_neededto handle unsigned-to-signed conversions at the same bit width. Specifically uint32 -> int32.Context:
This is a follow-up to #2799 (which fixed #2791) where uint8/uint16 casting was addressed. That fix only covered widening conversions (
source_width < target_width), which missed the uint32 case since both uint32 and int32 are 32-bit. Without this cast, Parquet files are written with theUINT_32physical type while Iceberg metadata declaresINT_32, causing Spark to fail on read.Changes / Are these changes tested?
pyiceberg/io/pyarrow.py: Extended the cast condition to also trigger when the source is an unsigned integer with the same (or smaller) bit width as the signed targettests/io/test_pyarrow.py: Added (pa.uint32(),IntegerType(),pa.int32()) test caseNotes
safe=True, so values exceedingINT32_MAX (2^31-1)will raise rather than silently corrupt