fix(writer): validate equality delete field ids#2723
Conversation
d0f2659 to
fd920af
Compare
viirya
left a comment
There was a problem hiding this comment.
Confirmed the gap: EqualityDeleteWriterConfig::new on main goes straight to schema_to_arrow_schema + RecordBatchProjector::new with no validation of equality_ids, so an empty / duplicated / non-existent id was silently accepted. Adding these checks is a real improvement.
The comment fix is accurate against the spec — equality-delete fields do follow the identifier-field type restrictions except that optional columns (and columns under optional structs) are allowed. I checked the spec wording and it matches what you wrote.
One thing worth flagging for the record: the duplicate-id and "id must exist in schema" checks go a bit beyond the Java reference — Java core guards against null/empty (GenericAppenderFactory, FileWriterBuilderImpl) but does not check duplicates or schema membership. I think adding them is the right call (a duplicate equality id is a clear user error, and an id absent from the schema would produce a meaningless delete file), just noting it's stricter than Java so it's a deliberate choice rather than an oversight.
Pulled the branch: all 11 equality_delete tests pass (including the 5 new ones), clippy and rustfmt clean. LGTM.
fd920af to
4729bc5
Compare
|
@viirya Thanks for the thorough and rigorous review — really appreciate you cross-checking the spec wording and the Java reference implementation. You're right that the duplicate-id and schema-membership checks are deliberate stricter guards. A duplicate equality id is almost certainly a user/configuration error, and an id missing from the schema would lead to a confusing failure later (or a meaningless delete file). Failing fast here with a clear error message feels like the right behavior for the Rust writer. I'll also open a corresponding PR against the Java SDK to add the same duplicate / missing-id validation there, so the two implementations stay aligned on this stricter behavior. I'll link it back to this PR once it's up. Thanks again for the LGTM! |
6d93c61 to
59d3afb
Compare
Which issue does this PR close?
What changes are included in this PR?
RecordBatchProjectorbehavior for unsupported field types and map/list reachability.Are these changes tested?
cargo test -p iceberg test_equality_delete_rejects_empty_equality_ids --libcargo test -p iceberg test_equality_delete_rejects_duplicate_equality_ids --libcargo test -p iceberg test_equality_delete_rejects_missing_equality_id --libcargo test -p iceberg test_equality_delete_unreachable_column --libcargo test -p iceberg test_equality_delete_with_nullable_field --libcargo test -p iceberg equality_delete --libcargo fmt --checkcargo test -p iceberg --libcargo clippy -p iceberg --all-features --lib --tests -- -D warningsmake checkmake unit-testdocker compose -f dev/docker-compose.yaml up -d --build --waitmake nextest(1832 passed, 0 skipped)