feat: Native Parquet Iceberg Data File Writes In Comet by jordepic · Pull Request #4487 · apache/datafusion-comet

jordepic · 2026-05-28T01:50:42Z

Which issue does this PR close?

Closes #4322.

Rationale for this change

Comet, up until this point, has mainly been focused with accelerating reads from iceberg tables. However, a significant resources are being spent across various companies in order to rewrite iceberg data. Large tables need to be compacted to maintain their sort/Z order, and general data pipelines may write significant amounts of iceberg data. Having to do a large transpose between columnar and row-wise data is inefficient, and we'd much prefer to go directly from arrow-based column batches to parquet on disk.

What changes are included in this PR?

This change is split into three parts.

Splitting the existing iceberg V2 spark write command into two: a "writer" spark operator and a "committer" spark operator.

This allows the iceberg data file write to be treated identically to a spark V1 parquet writing command, thereby allowing using similar code to handle the operator
Without it, both the data file writing and committing would live outside of the scope of spark adaptive query execution operator wrappers - we instead want the write operator to be within AQE so that as the data feeds into it gets re-planned we can determine whether the upstream data of the write is columnar

Determining which operators should be converted to native code

This follows a simple philosophy: only convert writes to native code if they produce an "identical" outcome as the Java path
Disclaimer: nothing truly produces an identical result because parquet row group flushing is different
Besides that though, we can generally convert "normal" writes (parquet, default settings, some flexibility to change other settings as outlined in iceberg-writes.md, no delete files since iceberg-rust doesn't support positional deletes/DVs)

Native iceberg write operator

The JVM is responsible for computing all iceberg writing settings and using a protobuf object to pass them to rust
Rust uses iceberg-rs in order to write the file and return avro-encoded dummy manifest bytes back to the JVM

How are these changes tested?

This change is tested extensively.

Tests to ensure that iceberg writes are replaced by our "two-operator" structure
Tests to ensure that the comet JVM properly serializes relevant data to protocol buffer form to go to the native layer
Tests to ensure that native writes are only performed under very specific iceberg table properties
Tests to ensure that native writes actually function as expected
Tests to ensure that compaction/sorting/z-ordering can now be fully accelerated with native writing

comphead

Thanks @jordepic this is epic PR, @mbutrovich FYI, my understanding though that writes should be addressed in iceberg-rs to be supported by iceberg community and reusable by other users

jordepic · 2026-05-29T18:24:24Z

Thanks @jordepic this is epic PR, @mbutrovich FYI, my understanding though that writes should be addressed in iceberg-rs to be supported by iceberg community and reusable by other users

Can you elaborate on that @comphead ? I use iceberg-rs to perform the writes here!

jordepic · 2026-06-09T20:52:18Z

@mbutrovich sorry for the additional tag here. I've actually been running this locally now and it's really been effective. Would you like me to try and split it a bit further, clean up some comments, etc?

andygrove · 2026-06-18T14:14:39Z

Thanks for the epic PR @jordepic. I've started looking through it and using AI to help me comprehend and review this, since I am not an Iceberg expert.

I noticed that this PR is creating directly from the main branch of your fork - I'd recommend creating a separate branch.

I like this this functionality is disabled by default so users can opt-in while this goes through more testing.

Could you share some performance numbers and explain how you are benchmarking this?

jordepic · 2026-06-18T14:24:33Z

Thanks, Andy! I'd always be happy to jump on a call to discuss the PR in more detail. I'm not really an expert of anything (and just have medium knowledge in Spark and slightly better for iceberg). Let me see if I can go about getting benchmarks! I find that with wider datasets the difference is more apparent because there is more of a penalty to doing an extra transpose when reading iceberg into rows and writing it back.

Also, I'm starting to break this PR into chunks to make it easier for review so that Matt can review it when he is back from pto. Here is the first link:
#4658

jordepic · 2026-06-18T20:06:40Z

Methodology:

Write a 2 million row parquet file, which came to 1.5 GB, use it for all tests
Read it in and write it back to iceberg
We can read the file with or without comet, and write it with or without comet
Use reading in comet but writing with iceberg-java as a control

	A baseline	B comet read	C comet read+write	total A→C
100 columns (2M rows)	16.53s	13.83s	4.57s	3.6×
1 column (86.5M rows)	7.55s	7.74s	3.16s	2.4×

Like every great benchmark, this was taken on my Mac with a bunch of other shit running on it (I could tell you what but then I'd have to kill you).

You can see that the full native pipeline is 3.6x as fast as the writing pipeline with 100 columns. You can see that much of the performance gain can be attributed to enabling native writes. However, I postulated that much of that could have been due to having to pivot data. For that reason, I made a subsequent test doing the same thing but on a 1 column parquet file. You can see that the majority of improvement in performance actually comes from enabling writes, even though the relative difference between all JVM and all native is less.

In practice, I've had some phenomenal numbers. I'm working with a 5k column dataset that is virtually unwritable with Comet. Now, I can treat it like any other dataset! I find that datasets with fewer columns may exhibit equal performance to spark, I imagine there may be some weird overhead spinning up comet executors on K8s but I haven't investigated this too much yet. I'll attach an image of a query plan too.

Here is a plan that goes through many different join operations and keeps the whole pipeline columnar, including the write at the end! This can be really impactful for ETL jobs and compactions :).

jordepic force-pushed the main branch 4 times, most recently from 7118d08 to 15c8f15 Compare May 29, 2026 15:25

comphead reviewed May 29, 2026

View reviewed changes

mbutrovich mentioned this pull request May 29, 2026

ci: gate long-running jobs behind ubuntu-slim jobs #4494

Merged

Jordan Epstein added 2 commits May 30, 2026 13:28

feat: add split-operator plan for Iceberg V2 writes

b5aa839

feat: detect Iceberg V2 writes and emit fall-back reasons

92d58ea

jordepic force-pushed the main branch from 9072489 to 9af1cd9 Compare May 30, 2026 18:55

feat: implement native Iceberg V2 writer via iceberg-rust

17e5a10

jordepic force-pushed the main branch from 9af1cd9 to 17e5a10 Compare May 31, 2026 17:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Native Parquet Iceberg Data File Writes In Comet#4487

feat: Native Parquet Iceberg Data File Writes In Comet#4487
jordepic wants to merge 3 commits into
apache:mainfrom
jordepic:main

jordepic commented May 28, 2026

Uh oh!

comphead left a comment

Uh oh!

jordepic commented May 29, 2026

Uh oh!

jordepic commented Jun 9, 2026

Uh oh!

andygrove commented Jun 18, 2026

Uh oh!

jordepic commented Jun 18, 2026 •

edited

Loading

Uh oh!

jordepic commented Jun 18, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jordepic commented May 28, 2026

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

Uh oh!

comphead left a comment

Choose a reason for hiding this comment

Uh oh!

jordepic commented May 29, 2026

Uh oh!

jordepic commented Jun 9, 2026

Uh oh!

andygrove commented Jun 18, 2026

Uh oh!

jordepic commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jordepic commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jordepic commented Jun 18, 2026 •

edited

Loading

jordepic commented Jun 18, 2026 •

edited

Loading