Skip to content

fix: handle upsert after schema evolution with added columns#3524

Open
GayathriSrividya wants to merge 1 commit into
apache:mainfrom
GayathriSrividya:fix/issue-3105-upsert-schema-mismatch
Open

fix: handle upsert after schema evolution with added columns#3524
GayathriSrividya wants to merge 1 commit into
apache:mainfrom
GayathriSrividya:fix/issue-3105-upsert-schema-mismatch

Conversation

@GayathriSrividya

@GayathriSrividya GayathriSrividya commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

closes #3105

Summary

  • Fix upsert row-matching when source schema has newly-added non-key columns after schema evolution.
  • Avoid casting the entire source table to the target schema in upsert row comparison.
  • Add a regression test for upsert after add_column evolution.

Root cause

get_rows_to_update cast the full source table to the matched target batch schema before joining. When source contains a newly-added column, target scan batches may not include that column, causing schema-name mismatch errors.

Changes

  • In pyiceberg/table/upsert_util.py:
    • Cast only join columns to target join-column schema.
    • Keep join key order stable by using join_cols directly.
    • Treat missing non-key columns in target rows as None during comparison.
  • In tests/table/test_upsert.py:
    • Add test_upsert_after_schema_add_column regression test.

Validation

  • make lint
  • /path/.venv/bin/python -m build
  • /path/.venv/bin/python -m pytest tests/table/test_upsert.py -xvs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Upsert fails after update_schema().union_by_name() due to schema mismatch

1 participant