Skip to content

[SPARK-57023][SQL] DecimalAggregates: peel widened Cast on Min/Max#56078

Open
yaooqinn wants to merge 1 commit into
apache:masterfrom
yaooqinn:users/kentyao/spark-decimal-minmax-cast-peel
Open

[SPARK-57023][SQL] DecimalAggregates: peel widened Cast on Min/Max#56078
yaooqinn wants to merge 1 commit into
apache:masterfrom
yaooqinn:users/kentyao/spark-decimal-minmax-cast-peel

Conversation

@yaooqinn
Copy link
Copy Markdown
Member

What changes were proposed in this pull request?

Extend DecimalAggregates to peel a scale-preserving widening Cast around Min/Max arguments, mirroring the existing SUM/AVG widened-Cast arms landed via SPARK-56983.

When the input is Min(Cast(inner: dec(p, s), dec(p', s))) (or Max(...)) with p' >= p and no CheckOverflow wrapper, the rule rewrites to Cast(Min(inner), dec(p', s)) (and likewise for Max). MIN/MAX are pointwise on a totally-ordered domain, so under same-scale widening the rewrite is value-equivalent and NULL-preserving (see design §D6 self-Q&A).

Both arms reuse the same-package WidenedDecimalChild extractor introduced for SUM/AVG, which refuses to unwrap CheckOverflow and enforces the same s == s', p' >= p guard. TreePatterns.MIN / TreePatterns.MAX are added and registered on Min / Max; DecimalAggregates's containsAnyPattern pruning is widened to (SUM, AVERAGE, MIN, MAX). No new rule, no new file — three arms cohabit one object.

Why are the changes needed?

The SUM/AVG arms recover the long-backed fast path when BI tools generate SUM(CAST(small_dec AS larger_dec)). The MIN/MAX case is the natural sibling: same widening pattern, but currently no peel arm exists, so each aggregated row pays a per-row Decimal.changePrecision call inside Cast (Cast.scala:1074-1082) even though the outer Cast could be applied once to the partition extremum instead.

MIN/MAX are pointwise on a totally-ordered domain and immune to both the SUM overflow boundary (SPARK-56983) and the AVG SPARK-37024 Double-regime gate, so the equivalence is unconditional within the WidenedDecimalChild guard domain (R1b Lemma 1 — design §D6 self-Q&A).

The saving is per-row changePrecision elimination on the aggregate input — ceiling −0.39% ~ −6.02% JDK-progressive (JDK 25 strongest) per the GHA 3 JDK × 16 case matrix below — and the patch is essentially free: three lines of extractor reuse, no new rule, no new file.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

  • DecimalAggregatesSuite extended with 6 SPARK-57023 oracle cases — 2 peel positives (Min/Max over widening Cast) and 4 negatives (scale-changing Cast, narrowing Cast, CheckOverflow-wrapped, MinBy/MaxBy/MaxMinByK). Suite: 43/43.

  • Full catalyst/test: 9341 tests / 353 suites, 0 failed, 5 ignored, 670 s.

  • TPCDSV1_4PlanStabilitySuite + TPCDSV1_4PlanStabilityWithStatsSuite: no golden change on apache/master. Existence-test on efb7beab826 recorded 0 trigger across the 130 TPC-DS v1.4 + v2.7.0 queries (see investigation 0002-baseline-run-results.md, positive-control verified).

  • DecimalAggregatesBenchmark: extended with new MIN section (C1-C4) and MAX section (D1-D4) mirroring the existing SUM/AVG sections. Full 8-case × 3 JDK matrix run on GitHub Actions standard ubuntu-22.04 runners (AMD EPYC 7763 64-Core, 10M rows × 5 iters); DecimalAggregatesBenchmark{-,-jdk21-,-jdk25-}results.txt regenerated and committed. Headline cells (Best ms peel off / peel on / Δ%):

    case p,s,p' JDK 17 JDK 21 JDK 25
    C1 MIN 10,2,18 3974 / 3920 (−1.36%) 3301 / 3202 (−3.00%) 1353 / 1291 (−4.58%)
    C2 MIN 10,2,28 3959 / 3880 (−2.00%) 3294 / 3228 (−2.00%) 1351 / 1287 (−4.74%)
    C3 MIN 18,2,28 3623 / 3609 (−0.39%) 3557 / 3450 (−3.01%) 1368 / 1292 (−5.56%)
    C4 MIN 10,2,38 3856 / 3835 (−0.54%) 3240 / 3151 (−2.75%) 1348 / 1283 (−4.82%)
    D1 MAX 10,2,18 3854 / 3785 (−1.79%) 3241 / 3173 (−2.10%) 1346 / 1279 (−4.98%)
    D2 MAX 10,2,28 3908 / 3808 (−2.56%) 3267 / 3152 (−3.52%) 1352 / 1287 (−4.81%)
    D3 MAX 18,2,28 3664 / 3620 (−1.20%) 3507 / 3462 (−1.28%) 1378 / 1295 (−6.02%)
    D4 MAX 10,2,38 3904 / 3792 (−2.87%) 3233 / 3164 (−2.13%) 1342 / 1274 (−5.07%)

    Pattern: ceiling −0.39% ~ −6.02% JDK-progressive (JDK 25 strongest, JDK 17 weakest) across the 24 readings; no negative-delta (regression) cell. The saving is per-row Decimal.changePrecision elimination on the aggregate input — design 0002-design-minmax-fastpath.md §D5.1 declares this a legal micro-only ship contract. A pre-GHA local sbt sanity (Section C MIN, JDK 17, 10M × 5) recorded −1.5% to −2.0%; the GHA EPYC numbers above supersede it.

Note on Section A/B results.txt refresh

The sibling SPARK-56627 .scala edited Section B2/B4 cases to p'=11 but did not regenerate DecimalAggregatesBenchmark-results.txt, so the committed text was stale for the new shape. This PR regenerates the whole file under the canonical EPYC 7763 GHA runner (replacing the prior EPYC 9V74 numbers in Section A/B), and the same regeneration produces the JDK 21 / JDK 25 companion files. Section A/B refresh is mechanical housekeeping triggered by the regeneration, not part of the MIN/MAX peel logic — flagged here for reviewer transparency.

Why no TPC-DS results?

The existence-test on apache/master efb7beab826 walked optimized plans across the full 130-query TPC-DS v1.4 + v2.7.0 corpus and recorded 0 queries triggering the Min(Cast(...)) / Max(Cast(...)) widening pattern (see investigation 0002-baseline-run-results.md, positive-control verified). The MIN/MAX peel is justified by semantic equivalence + micro-level pattern coverage, not by TPC-DS revenue. Design 0002-design-minmax-fastpath.md §D5.1 declares this as a legal micro-only ship contract.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Opus 4.7

### What changes were proposed in this pull request?

Extends `DecimalAggregates` with two new arms that hoist a
scale-preserving widening `Cast` out of `Min` / `Max` so the aggregate
runs on the narrower inner `Decimal`. The peel reuses the existing
`WidenedDecimalChild` extractor (introduced for SUM/AVG in SPARK-56627
and tightened in SPARK-56983) — same scale, same-or-larger precision,
`CheckOverflow` guard — so the eligibility surface is identical to the
already-shipped SUM/AVG fast paths.

`Min` / `Max` pick an existing row's value (no arithmetic), so a
widening Cast is bit-identical whether applied per-row before the
aggregate or once on the aggregate's result. The outer `Cast`
preserves the pre-rewrite result dataType (`Min.dataType == child.dataType`).

To keep the rule's tree-pattern pruning tight, this PR also adds
`MIN` / `MAX` to `TreePatterns` and registers them as `nodePatterns`
on the `Min` / `Max` aggregate expressions.

### Why are the changes needed?

Widened decimal Cast is a common shape (e.g. `Min/Max(CAST(d AS dec(p',s)))`
emitted by type coercion for set ops, UNION-by-name, view schemas). Without
the peel the per-row Cast runs inside the aggregate hot path.

### Does this PR introduce _any_ user-facing change?

No. Same result type, same eval semantics, same NULL semantics.

### How was this patch tested?

Six new oracle tests in `DecimalAggregatesSuite`:
- `MIN(CAST(dec(7,2) AS dec(12,2)))` peels via widened-Cast fast path
- `MAX(CAST(dec(7,2) AS dec(12,2)))` peels via widened-Cast fast path
- `MIN(CAST(dec(7,2) AS dec(12,4)))` does NOT peel (scale change)
- `MIN(CAST(dec(17,2) AS dec(10,2)))` does NOT peel (narrowing)
- `MIN/MAX(CheckOverflow)` does NOT peel (CheckOverflow guard)
- `MinBy/MaxBy/MaxMinByK` with widened-Cast value do NOT peel
  (rule pattern matches only Min/Max)

All 43 tests in DecimalAggregatesSuite pass.

### Was this patch authored or co-authored using generative AI tooling?

No.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant