Skip to content

Bug Fix: Cast UINT32 to INT32 to ensure compatibility with other engines #3529

Open
JeroenSchmidt wants to merge 1 commit into
apache:mainfrom
bookingcom:fix-integer-type-write-compatibility-uint32-to-int32
Open

Bug Fix: Cast UINT32 to INT32 to ensure compatibility with other engines #3529
JeroenSchmidt wants to merge 1 commit into
apache:mainfrom
bookingcom:fix-integer-type-write-compatibility-uint32-to-int32

Conversation

@JeroenSchmidt

@JeroenSchmidt JeroenSchmidt commented Jun 18, 2026

Copy link
Copy Markdown

Rationale for this change

  • PyIceberg was preserving original Arrow types in Parquet files, causing Spark to fail with Unsupported logical type: UINT_32
  • Extends the integer casting logic in ArrowProjectionVisitor._cast_if_needed to handle unsigned-to-signed conversions at the same bit width. Specifically uint32 -> int32.

Context:

This is a follow-up to #2799 (which fixed #2791) where uint8/uint16 casting was addressed. That fix only covered widening conversions (source_width < target_width), which missed the uint32 case since both uint32 and int32 are 32-bit. Without this cast, Parquet files are written with the UINT_32 physical type while Iceberg metadata declares INT_32, causing Spark to fail on read.

Changes / Are these changes tested?

pyiceberg/io/pyarrow.py: Extended the cast condition to also trigger when the source is an unsigned integer with the same (or smaller) bit width as the signed target
tests/io/test_pyarrow.py: Added (pa.uint32(), IntegerType(), pa.int32()) test case

Notes

  • The cast uses PyArrow's default safe=True, so values exceeding INT32_MAX (2^31-1) will raise rather than silently corrupt
  • Existing behavior for all other integer type combinations is unchanged

@JeroenSchmidt JeroenSchmidt changed the title fix and tests Bug Fix: Cast UINT32 to INT32 to ensure compatibility with other engines Jun 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

attempting to write smallint/tinyint into int column results in incompatibility with other iceberg APIs

1 participant