perf(pm): route stale installs through scheduler#3050
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a centralized InstallScheduler to manage package downloads, extractions, and clones, replacing the previous pipeline module and decentralized OnceMap caches. The new scheduler implements explicit concurrency limits and deduplication for these operations. Feedback includes suggestions to improve error handling in the scheduler's shutdown logic by draining all waiter maps, a recommendation to use tokio::task::spawn_blocking instead of rayon::spawn for I/O-bound cloning tasks to prevent thread pool starvation, and a suggestion to ensure the scheduler is properly shut down in all error paths within the install service.
| fn fail_pending(&mut self, message: &str) { | ||
| for (_, waiters) in self.clone_waiters.drain() { | ||
| for waiter in waiters { | ||
| let _ = waiter.send(Err(message.to_string())); | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
The fail_pending method only drains clone_waiters. While EnsureClone responders are stored there, download_waiters also holds CloneSpec objects that might be associated with pending installation tasks. Although EnsureClone callers are ultimately unblocked via clone_waiters, it would be cleaner and more robust to also drain download_waiters and blocked_by_parent to ensure all internal state is cleared and any potential future extensions using those maps are also notified of the failure.
| rayon::spawn(move || { | ||
| let result = std::panic::catch_unwind(std::panic::AssertUnwindSafe(|| { | ||
| clone_package_from_cache_sync( | ||
| &job.spec.package.name, | ||
| &job.spec.package.version, | ||
| &job.spec.package.tarball_url, | ||
| &job.cache_path, | ||
| &job.spec.target, | ||
| ) | ||
| .map_err(|e| format!("{e:#}")) | ||
| })) | ||
| .unwrap_or_else(|_| Err("install clone worker panicked".to_string())); | ||
| let _ = done_tx.send(OpDone::Clone { target, result }); | ||
| }); |
There was a problem hiding this comment.
Using rayon::spawn for I/O-bound tasks like directory cloning can lead to starvation of the Rayon global thread pool if it's also used for heavy CPU-bound work elsewhere in the application. While the clone_limit provides some protection, it is generally safer to use tokio::task::spawn_blocking for I/O operations to keep them within Tokio's managed blocking pool, or use a dedicated thread pool for I/O.
| Err(e) => { | ||
| scheduler_handle.shutdown().await; | ||
| return Err(e); | ||
| } |
There was a problem hiding this comment.
In the install method, if load_package_lock_json_from_path fails, the scheduler_handle is shut down before returning the error. However, if other errors occur later in the function (e.g., during group_by_depth or other logic before the final shutdown call), the scheduler task might be leaked or left running longer than necessary. Consider using a guard pattern or ensuring shutdown is called in all error paths.
0af4c98 to
5b15fc6
Compare
717e4cd to
8891916
Compare
3339a0f to
f0321f3
Compare
4b77dd1 to
d279722
Compare
f0321f3 to
61578f1
Compare
📊 pm-bench-phases ·
|
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 9.99s | 1.23s | 10.27s | 10.08s | 600M | 303.9K |
| utoo-next | 9.43s | 1.40s | 10.61s | 12.88s | 1020M | 125.9K |
| utoo-npm | 9.58s | 0.87s | 10.85s | 12.92s | 1.02G | 122.8K |
| utoo | 9.14s | 1.21s | 10.56s | 12.52s | 967M | 110.2K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 20.9K | 17.5K | 1.19G | 7M | 1.86G | 1.75G | 1M |
| utoo-next | 146.2K | 108.6K | 1.16G | 6M | 1.71G | 1.70G | 2M |
| utoo-npm | 148.1K | 115.3K | 1.16G | 5M | 1.71G | 1.70G | 2M |
| utoo | 130.6K | 87.0K | 1.16G | 5M | 1.71G | 1.70G | 2M |
p1_resolve
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 3.63s | 0.69s | 3.84s | 1.14s | 490M | 198.5K |
| utoo-next | 4.58s | 0.53s | 5.47s | 1.80s | 610M | 79.8K |
| utoo-npm | 4.07s | 0.21s | 5.63s | 2.30s | 610M | 90.2K |
| utoo | 3.71s | 0.42s | 5.44s | 1.86s | 594M | 77.5K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 14.9K | 2.9K | 202M | 3M | 107M | - | 1M |
| utoo-next | 58.0K | 68.1K | 200M | 3M | 7M | 3M | 2M |
| utoo-npm | 86.4K | 86.2K | 200M | 3M | 7M | 3M | 2M |
| utoo | 59.9K | 67.5K | 200M | 3M | 7M | 3M | 2M |
p3_cold_install
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 6.87s | 0.12s | 6.40s | 10.14s | 517M | 197.8K |
| utoo-next | 12.95s | 1.89s | 5.11s | 11.65s | 462M | 63.2K |
| utoo-npm | 8.89s | 0.85s | 5.24s | 11.26s | 464M | 58.9K |
| utoo | 8.23s | 0.28s | 5.07s | 11.05s | 466M | 55.7K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 8.2K | 8.4K | 1019M | 4M | 1.76G | 1.76G | 1M |
| utoo-next | 148.6K | 49.0K | 990M | 4M | 1.70G | 1.70G | 2M |
| utoo-npm | 127.0K | 57.0K | 990M | 4M | 1.70G | 1.70G | 2M |
| utoo | 105.8K | 56.9K | 990M | 3M | 1.70G | 1.70G | 2M |
p4_warm_link
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 3.37s | 0.04s | 0.19s | 2.36s | 135M | 32.5K |
| utoo-next | 2.26s | 0.02s | 0.49s | 3.78s | 79M | 18.0K |
| utoo-npm | 2.21s | 0.03s | 0.51s | 3.75s | 79M | 18.5K |
| utoo | 2.14s | 0.06s | 0.40s | 3.51s | 52M | 11.3K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 332 | 25 | 5M | 44K | 1.91G | 1.75G | 1M |
| utoo-next | 41.9K | 19.2K | 28K | 5K | 1.70G | 1.70G | 2M |
| utoo-npm | 43.2K | 19.6K | 31K | 12K | 1.70G | 1.70G | 2M |
| utoo | 21.7K | 13.8K | 29K | 25K | 1.71G | 1.70G | 2M |
npmmirror.com: no output captured.
Summary
Verification