[SPARK-57027][SQL] SortMergeJoinExec: skip statically-dead branches in codegen#56075
Draft
gengliangwang wants to merge 1 commit into
Draft
[SPARK-57027][SQL] SortMergeJoinExec: skip statically-dead branches in codegen#56075gengliangwang wants to merge 1 commit into
gengliangwang wants to merge 1 commit into
Conversation
…n codegen
### What changes were proposed in this pull request?
Two statically-dead patterns in `SortMergeJoinExec` codegen:
1. `genComparison` emits `comp = 0; if (comp == 0) { comp = compare(k1); } ...`.
The first `if (comp == 0)` is always true (we just assigned 0). Emit
`comp = compare(k1);` directly; only wrap subsequent keys. `genComparison`
is called 5x per SMJ stage (twice in `genScanner`, three times in
`codegenFullOuter`). For single-key joins (common), each call collapses
to one line.
2. `genScanner` and `codegenFullOuter` emit
`if (k1IsNull || k2IsNull || ...) { handler }`. When all key `ExprValue`s
have `isNull == FalseLiteral`, the disjunction is statically `false` and
the whole block (including its `handleStreamedAnyNull` / "join with null
row" handler) is dead. Detect this and omit the block. Hits fact/
dimension joins on numeric keys where Spark has already proved
non-nullability.
### Why are the changes needed?
Smaller generated Java per SMJ stage. JIT eliminates the dead code at
runtime; the win is smaller generated source, more 64KB method-limit
headroom, and slightly faster Janino compile.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Existing test suites cover both paths with whole-stage codegen on and
off:
- `OuterJoinSuite` (SMJ full-outer codegen + interpreted scanner).
- `InnerJoinSuite` (SMJ codegen and non-codegen paths).
- `ExistenceJoinSuite` (SMJ existence path).
### Was this patch authored or co-authored using generative AI tooling?
Yes, with Claude Code.
Co-authored-by: Isaac
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
This is a sub-task of SPARK-56908.
Two statically-dead patterns in
SortMergeJoinExeccodegen:genComparisonemitsThe first
if (comp == 0)is always true (we just assigned 0). Emitcomp = compare(k1);directly; only wrap subsequent keys.genComparisonis called 5x per SMJ stage (twice ingenScanner, three times incodegenFullOuter). For single-key joins (common), each call collapses to one line.genScannerandcodegenFullOuteremitif (k1IsNull || k2IsNull || ...) { handler }. When all keyExprValues haveisNull == FalseLiteral, the disjunction is staticallyfalseand the whole block (including itshandleStreamedAnyNull/ "join with null row" handler) is dead. Detect this and omit the block. Hits fact/dimension joins on numeric keys where Spark has already proved non-nullability.Why are the changes needed?
Smaller generated Java per SMJ stage. JIT eliminates the dead code at runtime; the win is smaller generated source, more 64KB method-limit headroom, and slightly faster Janino compile.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Existing test suites cover both paths with whole-stage codegen on and off:
OuterJoinSuite(SMJ full-outer codegen + interpreted scanner).InnerJoinSuite(SMJ codegen and non-codegen paths).ExistenceJoinSuite(SMJ existence path).Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Code