feat: extend native windows support#4209
Conversation
|
Upstream issue with NTILE apache/datafusion#22049 |
|
Default value is |
|
Decimal frame is not support apache/datafusion#22113 |
|
Crash on frame overflow #4307 |
|
AVG correctness issue with NULLs apache/datafusion#22138 |
andygrove
left a comment
There was a problem hiding this comment.
A few things I'd like to resolve before merge.
1. NTILE looks like it runs natively, but the tests and comments say it falls back. Both window_functions.sql section 6.1 and the CometWindowExecSuite NTILE test describe a guard that forces NTILE to fall back to Spark, with the suite comment saying the test "pins that behavior so it will fail once the guard is removed." But I don't see a guard. NTile is wired straight to scalarFunctionExprToProto("ntile", ...) with no withFallbackReason/return None, and both tests assert native execution (checkSparkAnswerAndOperator and a bare query both set assertCometNative = true). So NTILE is actually executing natively and the tests are validating its correctness on the sample data, which is the opposite of what the comments claim. Given #4255 and the upstream DataFusion #22049 NTILE bug, and given this PR makes window Compatible by default, that means NTILE now runs natively by default with a known correctness issue. NTILE divergence tends to surface when the partition size is not evenly divisible by the bucket count, so favorable test data can mask it. Could you clarify the intent? Either the guard is missing and the tests should assert fallback, or NTILE is meant to be native and the comments should be corrected with a note on why the known bug is acceptable.
2. expressions.md window section is now stale. The hand-edited window_funcs section still describes window support as disabled due to known correctness issues and lists rank, dense_rank, row_number, ntile, percent_rank, cume_dist, and nth_value as not yet wired and falling back. This PR wires all of them and drops the Incompatible support level, so that table and prose need a refresh.
3. The support-level flip is the heart of the PR, worth calling out. Removing the getSupportLevel override moves CometWindowExec from Incompatible to Compatible, so it no longer needs allowIncompatible. The gating looks careful, but findings 1 and 4 both describe paths that now run natively by default, so it is worth being explicit about the default-on correctness story. Minor: a few tests still set getOperatorAllowIncompatConfigKey(classOf[WindowExec]) -> "true" (for example "window: LAG with default offset"), which is now a no-op.
4. AVG over sliding frames delegates to DataFusion's avg. One of the linked notes flags an AVG/NULL correctness issue (DataFusion #22138). Ever-expanding frames use Comet's Avg, but the sliding branch uses by_name("avg"). Is the NULL issue reachable on that path? A sliding-window AVG test with NULL inputs would be reassuring, or a guard if it diverges.
5. PR description. For a change this size closing 8 issues, would you mind filling in the template? Lifting the routing rationale from the code comments and listing the newly wired functions would help reviewers and the changelog.
Disclosure: this review was prepared with AI assistance and reviewed by me before posting.
Which issue does this PR close?
Closes #2705
Closes #22
Closes #2721
Closes #4007
Closes #4293
Closes #4307
Closes #2841
Closes #4255
Rationale for this change
What changes are included in this PR?
How are these changes tested?