Skip to content

fix(writer): validate equality delete field ids#2723

Open
u70b3 wants to merge 1 commit into
apache:mainfrom
u70b3:fix/equality-delete-id-validation
Open

fix(writer): validate equality delete field ids#2723
u70b3 wants to merge 1 commit into
apache:mainfrom
u70b3:fix/equality-delete-id-validation

Conversation

@u70b3

@u70b3 u70b3 commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

What changes are included in this PR?

  • Add writer config validation for equality delete field IDs.
  • Reject empty equality ID lists, duplicate equality field IDs, and field IDs that are missing from the table schema.
  • Keep the existing RecordBatchProjector behavior for unsupported field types and map/list reachability.
  • Fix a stale comment: equality delete fields may be optional, unlike identifier fields.

Are these changes tested?

  • cargo test -p iceberg test_equality_delete_rejects_empty_equality_ids --lib
  • cargo test -p iceberg test_equality_delete_rejects_duplicate_equality_ids --lib
  • cargo test -p iceberg test_equality_delete_rejects_missing_equality_id --lib
  • cargo test -p iceberg test_equality_delete_unreachable_column --lib
  • cargo test -p iceberg test_equality_delete_with_nullable_field --lib
  • cargo test -p iceberg equality_delete --lib
  • cargo fmt --check
  • cargo test -p iceberg --lib
  • cargo clippy -p iceberg --all-features --lib --tests -- -D warnings
  • make check
  • make unit-test
  • Docker-backed integration path:
    • docker compose -f dev/docker-compose.yaml up -d --build --wait
    • make nextest (1832 passed, 0 skipped)

@u70b3 u70b3 force-pushed the fix/equality-delete-id-validation branch 3 times, most recently from d0f2659 to fd920af Compare June 30, 2026 07:30

@viirya viirya left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed the gap: EqualityDeleteWriterConfig::new on main goes straight to schema_to_arrow_schema + RecordBatchProjector::new with no validation of equality_ids, so an empty / duplicated / non-existent id was silently accepted. Adding these checks is a real improvement.

The comment fix is accurate against the spec — equality-delete fields do follow the identifier-field type restrictions except that optional columns (and columns under optional structs) are allowed. I checked the spec wording and it matches what you wrote.

One thing worth flagging for the record: the duplicate-id and "id must exist in schema" checks go a bit beyond the Java reference — Java core guards against null/empty (GenericAppenderFactory, FileWriterBuilderImpl) but does not check duplicates or schema membership. I think adding them is the right call (a duplicate equality id is a clear user error, and an id absent from the schema would produce a meaningless delete file), just noting it's stricter than Java so it's a deliberate choice rather than an oversight.

Pulled the branch: all 11 equality_delete tests pass (including the 5 new ones), clippy and rustfmt clean. LGTM.

@u70b3 u70b3 force-pushed the fix/equality-delete-id-validation branch from fd920af to 4729bc5 Compare July 1, 2026 02:41
@u70b3

u70b3 commented Jul 1, 2026

Copy link
Copy Markdown
Contributor Author

@viirya Thanks for the thorough and rigorous review — really appreciate you cross-checking the spec wording and the Java reference implementation.

You're right that the duplicate-id and schema-membership checks are deliberate stricter guards. A duplicate equality id is almost certainly a user/configuration error, and an id missing from the schema would lead to a confusing failure later (or a meaningless delete file). Failing fast here with a clear error message feels like the right behavior for the Rust writer.

I'll also open a corresponding PR against the Java SDK to add the same duplicate / missing-id validation there, so the two implementations stay aligned on this stricter behavior. I'll link it back to this PR once it's up.

Thanks again for the LGTM!

@u70b3 u70b3 force-pushed the fix/equality-delete-id-validation branch from 6d93c61 to 59d3afb Compare July 1, 2026 03:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Equality delete writer should reject invalid equality field IDs

2 participants