Skip to content

perf(pm): route stale installs through scheduler#3050

Draft
elrrrrrrr wants to merge 1 commit into
perf/pm-split-installer-scheduler-testsfrom
perf/pm-split-installer-scheduler-wiring
Draft

perf(pm): route stale installs through scheduler#3050
elrrrrrrr wants to merge 1 commit into
perf/pm-split-installer-scheduler-testsfrom
perf/pm-split-installer-scheduler-wiring

Conversation

@elrrrrrrr
Copy link
Copy Markdown
Contributor

@elrrrrrrr elrrrrrrr commented May 21, 2026

Summary

  • route stale-lock dependency resolution events into the install scheduler
  • remove the old compiled downloader/cloner OnceMap fallback now that scheduler owns inflight dedupe
  • update HTTP tarball install docs to reference scheduler-owned cache resolution

Verification

  • cargo fmt
  • cargo check -p utoo-pm
  • cargo test -p utoo-pm service::install_scheduler -- --nocapture
  • cargo clippy --all-targets -- -D warnings --no-deps

@elrrrrrrr elrrrrrrr added A-Pkg Manager Area: Package Manager benchmark Run pm-bench on PR labels May 21, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a centralized InstallScheduler to manage package downloads, extractions, and clones, replacing the previous pipeline module and decentralized OnceMap caches. The new scheduler implements explicit concurrency limits and deduplication for these operations. Feedback includes suggestions to improve error handling in the scheduler's shutdown logic by draining all waiter maps, a recommendation to use tokio::task::spawn_blocking instead of rayon::spawn for I/O-bound cloning tasks to prevent thread pool starvation, and a suggestion to ensure the scheduler is properly shut down in all error paths within the install service.

Comment on lines +565 to +571
fn fail_pending(&mut self, message: &str) {
for (_, waiters) in self.clone_waiters.drain() {
for waiter in waiters {
let _ = waiter.send(Err(message.to_string()));
}
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The fail_pending method only drains clone_waiters. While EnsureClone responders are stored there, download_waiters also holds CloneSpec objects that might be associated with pending installation tasks. Although EnsureClone callers are ultimately unblocked via clone_waiters, it would be cleaner and more robust to also drain download_waiters and blocked_by_parent to ensure all internal state is cleared and any potential future extensions using those maps are also notified of the failure.

Comment on lines +469 to +482
rayon::spawn(move || {
let result = std::panic::catch_unwind(std::panic::AssertUnwindSafe(|| {
clone_package_from_cache_sync(
&job.spec.package.name,
&job.spec.package.version,
&job.spec.package.tarball_url,
&job.cache_path,
&job.spec.target,
)
.map_err(|e| format!("{e:#}"))
}))
.unwrap_or_else(|_| Err("install clone worker panicked".to_string()));
let _ = done_tx.send(OpDone::Clone { target, result });
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using rayon::spawn for I/O-bound tasks like directory cloning can lead to starvation of the Rayon global thread pool if it's also used for heavy CPU-bound work elsewhere in the application. While the clone_limit provides some protection, it is generally safer to use tokio::task::spawn_blocking for I/O operations to keep them within Tokio's managed blocking pool, or use a dedicated thread pool for I/O.

Comment on lines +267 to +270
Err(e) => {
scheduler_handle.shutdown().await;
return Err(e);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

In the install method, if load_package_lock_json_from_path fails, the scheduler_handle is shut down before returning the error. However, if other errors occur later in the function (e.g., during group_by_depth or other logic before the final shutdown call), the scheduler task might be leaked or left running longer than necessary. Consider using a guard pattern or ensuring shutdown is called in all error paths.

@elrrrrrrr elrrrrrrr force-pushed the perf/pm-split-installer-scheduler-wiring branch from 0af4c98 to 5b15fc6 Compare May 21, 2026 19:54
@elrrrrrrr elrrrrrrr changed the title perf(pm): add install scheduler wiring perf(pm): route stale installs through scheduler May 21, 2026
@elrrrrrrr elrrrrrrr changed the base branch from perf/pm-split-installer-cloner-primitives to perf/pm-split-installer-scheduler-tests May 21, 2026 19:54
@elrrrrrrr elrrrrrrr force-pushed the perf/pm-split-installer-scheduler-tests branch from 717e4cd to 8891916 Compare May 21, 2026 22:32
@elrrrrrrr elrrrrrrr force-pushed the perf/pm-split-installer-scheduler-wiring branch 2 times, most recently from 3339a0f to f0321f3 Compare May 21, 2026 23:09
@elrrrrrrr elrrrrrrr force-pushed the perf/pm-split-installer-scheduler-tests branch from 4b77dd1 to d279722 Compare May 21, 2026 23:39
@elrrrrrrr elrrrrrrr force-pushed the perf/pm-split-installer-scheduler-wiring branch from f0321f3 to 61578f1 Compare May 21, 2026 23:39
@github-actions
Copy link
Copy Markdown

📊 pm-bench-phases · 7116150 · linux (ubuntu-latest)

Workflow run — ant-design

PMs: utoo (this branch) · utoo-npm (latest published) · bun (latest)

npmjs.org

p0_full_cold

PM wall ±σ user sys RSS pgMinor
bun 9.99s 1.23s 10.27s 10.08s 600M 303.9K
utoo-next 9.43s 1.40s 10.61s 12.88s 1020M 125.9K
utoo-npm 9.58s 0.87s 10.85s 12.92s 1.02G 122.8K
utoo 9.14s 1.21s 10.56s 12.52s 967M 110.2K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 20.9K 17.5K 1.19G 7M 1.86G 1.75G 1M
utoo-next 146.2K 108.6K 1.16G 6M 1.71G 1.70G 2M
utoo-npm 148.1K 115.3K 1.16G 5M 1.71G 1.70G 2M
utoo 130.6K 87.0K 1.16G 5M 1.71G 1.70G 2M

p1_resolve

PM wall ±σ user sys RSS pgMinor
bun 3.63s 0.69s 3.84s 1.14s 490M 198.5K
utoo-next 4.58s 0.53s 5.47s 1.80s 610M 79.8K
utoo-npm 4.07s 0.21s 5.63s 2.30s 610M 90.2K
utoo 3.71s 0.42s 5.44s 1.86s 594M 77.5K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 14.9K 2.9K 202M 3M 107M - 1M
utoo-next 58.0K 68.1K 200M 3M 7M 3M 2M
utoo-npm 86.4K 86.2K 200M 3M 7M 3M 2M
utoo 59.9K 67.5K 200M 3M 7M 3M 2M

p3_cold_install

PM wall ±σ user sys RSS pgMinor
bun 6.87s 0.12s 6.40s 10.14s 517M 197.8K
utoo-next 12.95s 1.89s 5.11s 11.65s 462M 63.2K
utoo-npm 8.89s 0.85s 5.24s 11.26s 464M 58.9K
utoo 8.23s 0.28s 5.07s 11.05s 466M 55.7K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 8.2K 8.4K 1019M 4M 1.76G 1.76G 1M
utoo-next 148.6K 49.0K 990M 4M 1.70G 1.70G 2M
utoo-npm 127.0K 57.0K 990M 4M 1.70G 1.70G 2M
utoo 105.8K 56.9K 990M 3M 1.70G 1.70G 2M

p4_warm_link

PM wall ±σ user sys RSS pgMinor
bun 3.37s 0.04s 0.19s 2.36s 135M 32.5K
utoo-next 2.26s 0.02s 0.49s 3.78s 79M 18.0K
utoo-npm 2.21s 0.03s 0.51s 3.75s 79M 18.5K
utoo 2.14s 0.06s 0.40s 3.51s 52M 11.3K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 332 25 5M 44K 1.91G 1.75G 1M
utoo-next 41.9K 19.2K 28K 5K 1.70G 1.70G 2M
utoo-npm 43.2K 19.6K 31K 12K 1.70G 1.70G 2M
utoo 21.7K 13.8K 29K 25K 1.71G 1.70G 2M

npmmirror.com: no output captured.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-Pkg Manager Area: Package Manager benchmark Run pm-bench on PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant