Skip to content

feat(aqe): early-stop on global LIMIT#1727

Draft
wirybeaver wants to merge 1 commit into
apache:mainfrom
wirybeaver:earlystop
Draft

feat(aqe): early-stop on global LIMIT#1727
wirybeaver wants to merge 1 commit into
apache:mainfrom
wirybeaver:earlystop

Conversation

@wirybeaver
Copy link
Copy Markdown

@wirybeaver wirybeaver commented May 18, 2026

Summary

Adds scheduler-side early-stop for SELECT ... LIMIT N queries under AQE. When enough rows have been shuffled to satisfy the LIMIT, the scheduler cancels remaining tasks and finalizes the job as Successful — the downstream LimitExec slices to exact fetch.

  • JobLimitTracker — atomic row-count tracker that fires exactly once when SUM(num_rows) >= limit × safety_factor (default 1.5). Lock-free: fetch_add(Relaxed) + swap(SeqCst) guarantees one CancelRemaining across racing observers.
  • LimitEarlyStopAnalyzer — walks the post-stage-resolution physical plan top-down, identifies eligible GlobalLimitExec nodes (bare LIMIT, no OFFSET, no sorted subtree), traces through allowlisted operators to producer ExchangeExec stage IDs.
  • Scheduler integration — tracker registry on TaskManager, observation hook in update_task_statuses, new EarlyStopCancel event, early_stop_job synthesizes successful completion for producer stages and reports running tasks for cancellation.
  • Configballista.aqe.limit_early_stop.enabled (default true), gated behind the existing AQE flag.

Known limitation (v2.6)

DataFusion 53.x's LimitPushdown physical optimizer aggressively rewrites GlobalLimitExec into LocalLimitExec + fetch hints on CoalescePartitionsExec / DataSourceExec. In practice, the analyzer rarely finds a GlobalLimitExec in real plans. Extending recognition to LocalLimitExec and fetch-bearing operators is tracked in a companion spec file.

Test plan

  • 9 unit tests on JobLimitTracker (threshold transitions, exactly-once fire, 32-thread concurrency, preconditions)
  • 10 unit tests on LimitEarlyStopAnalyzer (eligible/ineligible operator mixes, multi-producer UNION, nested LIMITs)
  • 9 integration tests on AdaptiveExecutionGraph + TaskManager wiring (analyzer invocation, disabled flag, stage synthesis, update_task_statusesEarlyStopCancel emission, early_stop_job finalization, tracker cleanup)
  • Full scheduler test suite: 136 passed, 0 failed

Pure-logic atomic counter that aggregates per-stage row counts and
fires a one-shot CancelRemaining decision when the running sum crosses
limit * safety_factor. Foundation for the AQE early-stop feature
(apache#1359): subsequent commits add the plan-time eligibility analyzer,
scheduler-side wiring into update_task_statuses, and an EarlyStopCancel
event that finalizes the job as Successful with partial output.

Trigger uses AtomicU64 fetch_add (Relaxed) for row accumulation and
AtomicBool swap (SeqCst) on the triggered flag so that, across racing
observers, exactly one CancelRemaining is returned. Threshold is
computed in fixed-point arithmetic to avoid f64 imprecision at large
limits.

The observe() path carries an asymmetric correctness invariant: we
must fire no earlier than sum >= limit (firing early would cause the
downstream LimitExec to under-report). Firing late is always safe; it
only wastes I/O.

Includes 9 unit tests covering threshold rounding, below/at/above
trigger transitions, single-fire guarantee, untagged-stage handling,
multi-stage aggregation, 32-thread concurrent observer race, and
constructor preconditions.
@wirybeaver wirybeaver changed the title feat(aqe): add JobLimitTracker for early-stop on global LIMIT feat(aqe): early-stop on global LIMIT May 18, 2026
@wirybeaver wirybeaver marked this pull request as draft May 21, 2026 08:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant