Skip to content

refactor: share Spark 4.1+ shim sources to remove duplication#4710

Open
andygrove wants to merge 4 commits into
apache:mainfrom
andygrove:refactor-spark4-shims
Open

refactor: share Spark 4.1+ shim sources to remove duplication#4710
andygrove wants to merge 4 commits into
apache:mainfrom
andygrove:refactor-spark4-shims

Conversation

@andygrove

Copy link
Copy Markdown
Member

Which issue does this PR close?

N/A

Rationale for this change

The per-minor shim folders spark-4.1 and spark-4.2 each carried byte-identical copies of four files, so every change had to be made twice and could drift.

What changes are included in this PR?

  • Add a new source root spark/src/main/spark-4.1+ shared by the Spark 4.1 and 4.2 profiles via a new shims.minorPlusVerSrc Maven property.
  • Move the four shared shims (ShimPrepareExecutedPlan, CometInternalRowShim, ShimCometUnionExec, CometExprShim) into it. The 4.1 and 4.2 copies of CometExprShim were identical apart from comment wording, now made version-neutral.
  • The property defaults to an empty placeholder (spark-none), so Spark 4.0 and the 3.x profiles are unaffected.

How are these changes tested?

test-compile passes for the spark-4.0, spark-4.1, spark-4.2, and spark-3.5 profiles.

The per-minor shim folders spark-4.1 and spark-4.2 carried byte-identical
copies of four files. Extract a new spark-4.1+ source root, shared by the
4.1 and 4.2 profiles via a new shims.minorPlusVerSrc property, leaving
Spark 4.0 and the 3.x line unaffected through an empty placeholder default.
The empty src/main/spark-none/.gitkeep had no license header and is not
in the apache-rat exclude list, so every module build failed the RAT
check with 'unapproved license: 1'.

The directory was only a placeholder for the default shims.minorPlusVerSrc
value. build-helper's add-source does not require the source root to
exist and scalac/javac skip a missing one, so the placeholder is
unnecessary. Verified install -DskipTests -Pspark-3.5 succeeds with the
directory removed.
…builds

The default Maven build targets Spark 4.1, but shims.minorPlusVerSrc
defaulted to the spark-none placeholder and was only overridden to
spark-4.1+ inside the explicit spark-4.1/spark-4.2 profiles. A
default-profile build (such as the TPC-H/TPC-DS jobs that run
mvnw -Prelease install without a spark profile) therefore dropped the
four shared 4.1+ shims and failed to compile.

Flip the default to spark-4.1+ to match the default Spark version, and
override it back to spark-none in the spark-3.4, spark-3.5, and spark-4.0
profiles, mirroring how majorVerSrc and minorVerSrc are handled.
@andygrove andygrove force-pushed the refactor-spark4-shims branch from 79cd0a9 to 610e744 Compare June 23, 2026 15:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant