Skip to content

[AURON #2375] Fix Iceberg changelog scan field-id projection#2376

Open
lyne7-sc wants to merge 3 commits into
apache:masterfrom
lyne7-sc:fix/iceberg_changelogscan_fieldid
Open

[AURON #2375] Fix Iceberg changelog scan field-id projection#2376
lyne7-sc wants to merge 3 commits into
apache:masterfrom
lyne7-sc:fix/iceberg_changelogscan_fieldid

Conversation

@lyne7-sc

@lyne7-sc lyne7-sc commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Closes #2375

Rationale for this change

The regular native Iceberg scan path already passes Iceberg field IDs to the native reader, which makes top-level schema evolution such as column rename and drop-then-add safe for Parquet files.

The newer insert-only Iceberg changelog scan path also reads the underlying Parquet data files through the native reader, but it does not pass the same field-id mapping into the native scan plan yet. As a result, native Parquet schema matching falls back to column names on the changelog path.

This can return wrong results after Iceberg schema evolution. For example, after RENAME COLUMN, pre-rename files may read as null; after DROP + ADD of the same name, the newly added column may read data from the old dropped column.

What changes are included in this PR?

  • Extract field IDs from SparkChangelogScan's expected Iceberg schema.
  • Reuse the existing Iceberg rename/drop detection for changelog scans.
  • Pass changelog field IDs into IcebergScanPlan instead of Map.empty.
  • Keep nested rename/drop unsupported and make ORC changelog scans fall back after top-level rename/drop, consistent with the regular Iceberg scan path.
  • Add changelog scan integration tests for:
    • renamed columns resolved by field-id;
    • drop-then-add columns with the same name not reusing the dropped field-id.

Are there any user-facing changes?

Yes. Insert-only Iceberg changelog scans on renamed or drop-then-added Parquet columns now return correct results under the native scan. Unsupported cases continue to fall back to Spark. No API change.

How was this patch tested?

Added cases to AuronIcebergIntegrationSuite.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Iceberg changelog scan returns wrong data after column rename / drop-then-add

1 participant