perf(pm): add sync cloner primitive#3033
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces synchronous versions of package and directory cloning utilities, such as clone_package_from_cache_sync and clone_dir_sync, to support blocking filesystem operations within a worker pool. It also refactors existing async paths to utilize these new sync primitives via tokio::task::spawn_blocking. Feedback highlights the need for retry logic in the non-macOS native clone implementation to match the macOS path's resilience, the addition of a warning log during lenient JSON parsing for consistency, and the removal of a redundant directory cleanup operation in clone_package_sync.
| fn clone_dir_native_sync(real_src: &Path, dst: &Path) -> Result<()> { | ||
| hardlink_clone::clone_dir_sync(real_src, dst) | ||
| .with_context(|| format!("clone_dir {} -> {}", real_src.display(), dst.display())) | ||
| } |
There was a problem hiding this comment.
The synchronous implementation of clone_dir_native_sync for non-macOS platforms is missing the retry logic that exists in its asynchronous counterpart. This makes the synchronous path (used by the new scheduler) less resilient to transient filesystem errors, such as temporary file locks or concurrent access issues. To maintain parity with the async implementation, a retry loop with a backoff strategy should be added.
fn clone_dir_native_sync(real_src: &Path, dst: &Path) -> Result<()> {
let mut last_error = None;
for delay in std::iter::once(std::time::Duration::ZERO).chain(create_retry_strategy()) {
if !delay.is_zero() {
std::thread::sleep(delay);
}
match hardlink_clone::clone_dir_sync(real_src, dst) {
Ok(_) => return Ok(()),
Err(e) => {
let _ = std::fs::remove_dir_all(dst);
last_error = Some(e);
}
}
}
let err = last_error.unwrap_or_else(|| anyhow::anyhow!("unknown error"));
Err(err).with_context(|| format!("clone_dir {} -> {}", real_src.display(), dst.display()))
}| match serde_json::from_str(&content) { | ||
| Ok(v) => Ok(v), | ||
| Err(original_err) => match serde_json::from_str::<serde_json::Value>(&content) { | ||
| Ok(value) => serde_json::from_value(value) | ||
| .with_context(|| format!("Failed to deserialize {pkg_path:?}")), | ||
| Err(_) => { | ||
| Err(original_err).with_context(|| format!("Failed to parse JSON from {pkg_path:?}")) | ||
| } | ||
| }, | ||
| } |
There was a problem hiding this comment.
load_package_json_sync is missing the warning log when falling back to the lenient JSON parser (used to handle duplicate keys in package.json). This makes the synchronous path's logging inconsistent with the asynchronous load_package_json function in json.rs. Adding the warning ensures that non-standard package structures are surfaced to the user regardless of the execution path.
| match serde_json::from_str(&content) { | |
| Ok(v) => Ok(v), | |
| Err(original_err) => match serde_json::from_str::<serde_json::Value>(&content) { | |
| Ok(value) => serde_json::from_value(value) | |
| .with_context(|| format!("Failed to deserialize {pkg_path:?}")), | |
| Err(_) => { | |
| Err(original_err).with_context(|| format!("Failed to parse JSON from {pkg_path:?}")) | |
| } | |
| }, | |
| } | |
| match serde_json::from_str(&content) { | |
| Ok(v) => Ok(v), | |
| Err(original_err) => match serde_json::from_str::<serde_json::Value>(&content) { | |
| Ok(value) => { | |
| tracing::warn!( | |
| "package.json has non-standard structure (e.g. duplicate keys), \ | |
| retrying with lenient parser: {pkg_path:?}" | |
| ); | |
| serde_json::from_value(value) | |
| .with_context(|| format!("Failed to deserialize {pkg_path:?}")) | |
| } | |
| Err(_) => { | |
| Err(original_err).with_context(|| format!("Failed to parse JSON from {pkg_path:?}")) | |
| } | |
| }, | |
| } |
| if let Err(e) = std::fs::remove_dir_all(dst) { | ||
| tracing::warn!("Failed to clean target directory {}: {}", dst.display(), e); | ||
| } |
There was a problem hiding this comment.
The call to std::fs::remove_dir_all(dst) here is redundant because clone_sync (called immediately after on line 565) also performs the same existence check and removal logic. This results in an unnecessary extra syscall and potentially duplicate warning logs if the directory removal fails. It is cleaner to let the lower-level clone_sync primitive handle the destination cleanup.
a995b27 to
4b5ede8
Compare
4b5ede8 to
0607957
Compare
📊 pm-bench-phases ·
|
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 11.45s | 0.99s | 8.09s | 7.92s | 605M | 352.0K |
| utoo-next | 8.97s | 0.78s | 8.54s | 9.57s | 987M | 122.4K |
| utoo-npm | 8.90s | 0.20s | 8.67s | 9.81s | 1016M | 120.3K |
| utoo | 9.50s | 0.69s | 8.37s | 9.74s | 1002M | 123.3K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 28.0K | 16.7K | 1.19G | 8M | 1.86G | 1.75G | 1M |
| utoo-next | 140.8K | 96.6K | 1.16G | 6M | 1.71G | 1.70G | 2M |
| utoo-npm | 155.7K | 106.3K | 1.16G | 6M | 1.71G | 1.70G | 2M |
| utoo | 155.1K | 83.7K | 1.16G | 6M | 1.71G | 1.70G | 2M |
p1_resolve
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 2.71s | 0.03s | 3.06s | 0.95s | 492M | 184.0K |
| utoo-next | 4.19s | 1.88s | 4.16s | 1.66s | 596M | 91.9K |
| utoo-npm | 3.13s | 0.05s | 4.28s | 1.98s | 611M | 85.3K |
| utoo | 3.06s | 0.05s | 4.17s | 1.66s | 618M | 87.6K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 19.5K | 2.2K | 202M | 3M | 107M | - | 1M |
| utoo-next | 65.3K | 72.2K | 200M | 3M | 7M | 3M | 2M |
| utoo-npm | 93.7K | 98.2K | 200M | 3M | 7M | 3M | 2M |
| utoo | 65.6K | 69.2K | 200M | 3M | 7M | 3M | 2M |
p3_cold_install
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 6.99s | 0.45s | 4.88s | 7.47s | 588M | 208.3K |
| utoo-next | 7.54s | 0.91s | 4.22s | 8.48s | 469M | 58.4K |
| utoo-npm | 8.72s | 2.21s | 4.29s | 8.52s | 458M | 57.0K |
| utoo | 8.37s | 2.14s | 4.22s | 8.37s | 485M | 62.5K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 15.3K | 7.1K | 1018M | 5M | 1.76G | 1.76G | 1M |
| utoo-next | 128.8K | 50.9K | 990M | 3M | 1.70G | 1.70G | 2M |
| utoo-npm | 142.7K | 47.4K | 990M | 4M | 1.70G | 1.70G | 2M |
| utoo | 122.0K | 46.8K | 990M | 4M | 1.70G | 1.70G | 2M |
p4_warm_link
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 3.52s | 0.17s | 0.17s | 1.90s | 136M | 32.5K |
| utoo-next | 2.61s | 0.48s | 0.40s | 2.85s | 79M | 18.1K |
| utoo-npm | 3.50s | 0.21s | 0.40s | 2.83s | 78M | 18.7K |
| utoo | 2.52s | 0.36s | 0.35s | 2.73s | 75M | 17.4K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 367 | 28 | 5M | 44K | 1.91G | 1.75G | 1M |
| utoo-next | 41.9K | 16.9K | 6K | 13K | 1.70G | 1.70G | 2M |
| utoo-npm | 38.9K | 16.7K | 3K | 19K | 1.70G | 1.70G | 2M |
| utoo | 25.6K | 10.3K | 6K | 9K | 1.71G | 1.70G | 2M |
npmmirror.com: no output captured.
Summary
Review-sized split from #3029 / #2948, stacked after #3032.
Adds the cloner primitive that the install scheduler will call later:
clone_package_from_cache_syncfor already-resolved cache paths;clone_package_onceglobal OnceMap interface intact;spawn_blocking;This PR does not remove clone OnceMap ownership yet; that happens in a later scheduler PR.
Size
1 file changed, 225 insertions(+), 79 deletions(-)Validation
cargo fmtcargo check -p utoo-pmcargo test -p utoo-pm util::cloner(17 passed)cargo clippy --all-targets -- -D warnings --no-depspack-napistill warns locally becausenext.jsis a symlink in this worktree; clippy exits successfully.