Skip to content

perf(pm): add sync cloner primitive#3033

Draft
elrrrrrrr wants to merge 1 commit into
perf/pm-split-installer-downloader-primitivesfrom
perf/pm-split-installer-cloner-primitives
Draft

perf(pm): add sync cloner primitive#3033
elrrrrrrr wants to merge 1 commit into
perf/pm-split-installer-downloader-primitivesfrom
perf/pm-split-installer-cloner-primitives

Conversation

@elrrrrrrr
Copy link
Copy Markdown
Contributor

Summary

Review-sized split from #3029 / #2948, stacked after #3032.

Adds the cloner primitive that the install scheduler will call later:

  • introduces clone_package_from_cache_sync for already-resolved cache paths;
  • keeps the existing clone_package_once global OnceMap interface intact;
  • routes the old OnceMap path through the sync primitive via spawn_blocking;
  • splits non-mac hardlink clone into sync and async wrappers;
  • adds sync package.json validation helpers matching the async path;
  • adds a focused sync primitive test.

This PR does not remove clone OnceMap ownership yet; that happens in a later scheduler PR.

Size

Validation

  • cargo fmt
  • cargo check -p utoo-pm
  • cargo test -p utoo-pm util::cloner (17 passed)
  • cargo clippy --all-targets -- -D warnings --no-deps

pack-napi still warns locally because next.js is a symlink in this worktree; clippy exits successfully.

@elrrrrrrr elrrrrrrr added A-Pkg Manager Area: Package Manager benchmark Run pm-bench on PR labels May 21, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces synchronous versions of package and directory cloning utilities, such as clone_package_from_cache_sync and clone_dir_sync, to support blocking filesystem operations within a worker pool. It also refactors existing async paths to utilize these new sync primitives via tokio::task::spawn_blocking. Feedback highlights the need for retry logic in the non-macOS native clone implementation to match the macOS path's resilience, the addition of a warning log during lenient JSON parsing for consistency, and the removal of a redundant directory cleanup operation in clone_package_sync.

Comment on lines +328 to +331
fn clone_dir_native_sync(real_src: &Path, dst: &Path) -> Result<()> {
hardlink_clone::clone_dir_sync(real_src, dst)
.with_context(|| format!("clone_dir {} -> {}", real_src.display(), dst.display()))
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The synchronous implementation of clone_dir_native_sync for non-macOS platforms is missing the retry logic that exists in its asynchronous counterpart. This makes the synchronous path (used by the new scheduler) less resilient to transient filesystem errors, such as temporary file locks or concurrent access issues. To maintain parity with the async implementation, a retry loop with a backoff strategy should be added.

fn clone_dir_native_sync(real_src: &Path, dst: &Path) -> Result<()> {
    let mut last_error = None;

    for delay in std::iter::once(std::time::Duration::ZERO).chain(create_retry_strategy()) {
        if !delay.is_zero() {
            std::thread::sleep(delay);
        }

        match hardlink_clone::clone_dir_sync(real_src, dst) {
            Ok(_) => return Ok(()),
            Err(e) => {
                let _ = std::fs::remove_dir_all(dst);
                last_error = Some(e);
            }
        }
    }

    let err = last_error.unwrap_or_else(|| anyhow::anyhow!("unknown error"));
    Err(err).with_context(|| format!("clone_dir {} -> {}", real_src.display(), dst.display()))
}

Comment on lines +264 to +273
match serde_json::from_str(&content) {
Ok(v) => Ok(v),
Err(original_err) => match serde_json::from_str::<serde_json::Value>(&content) {
Ok(value) => serde_json::from_value(value)
.with_context(|| format!("Failed to deserialize {pkg_path:?}")),
Err(_) => {
Err(original_err).with_context(|| format!("Failed to parse JSON from {pkg_path:?}"))
}
},
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

load_package_json_sync is missing the warning log when falling back to the lenient JSON parser (used to handle duplicate keys in package.json). This makes the synchronous path's logging inconsistent with the asynchronous load_package_json function in json.rs. Adding the warning ensures that non-standard package structures are surfaced to the user regardless of the execution path.

Suggested change
match serde_json::from_str(&content) {
Ok(v) => Ok(v),
Err(original_err) => match serde_json::from_str::<serde_json::Value>(&content) {
Ok(value) => serde_json::from_value(value)
.with_context(|| format!("Failed to deserialize {pkg_path:?}")),
Err(_) => {
Err(original_err).with_context(|| format!("Failed to parse JSON from {pkg_path:?}"))
}
},
}
match serde_json::from_str(&content) {
Ok(v) => Ok(v),
Err(original_err) => match serde_json::from_str::<serde_json::Value>(&content) {
Ok(value) => {
tracing::warn!(
"package.json has non-standard structure (e.g. duplicate keys), \
retrying with lenient parser: {pkg_path:?}"
);
serde_json::from_value(value)
.with_context(|| format!("Failed to deserialize {pkg_path:?}"))
}
Err(_) => {
Err(original_err).with_context(|| format!("Failed to parse JSON from {pkg_path:?}"))
}
},
}

Comment on lines +561 to +563
if let Err(e) = std::fs::remove_dir_all(dst) {
tracing::warn!("Failed to clean target directory {}: {}", dst.display(), e);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The call to std::fs::remove_dir_all(dst) here is redundant because clone_sync (called immediately after on line 565) also performs the same existence check and removal logic. This results in an unnecessary extra syscall and potentially duplicate warning logs if the directory removal fails. It is cleaner to let the lower-level clone_sync primitive handle the destination cleanup.

@elrrrrrrr elrrrrrrr force-pushed the perf/pm-split-installer-cloner-primitives branch 2 times, most recently from a995b27 to 4b5ede8 Compare May 21, 2026 23:09
@elrrrrrrr elrrrrrrr force-pushed the perf/pm-split-installer-cloner-primitives branch from 4b5ede8 to 0607957 Compare May 21, 2026 23:39
@github-actions
Copy link
Copy Markdown

📊 pm-bench-phases · dbd8f4e · linux (ubuntu-latest)

Workflow run — ant-design

PMs: utoo (this branch) · utoo-npm (latest published) · bun (latest)

npmjs.org

p0_full_cold

PM wall ±σ user sys RSS pgMinor
bun 11.45s 0.99s 8.09s 7.92s 605M 352.0K
utoo-next 8.97s 0.78s 8.54s 9.57s 987M 122.4K
utoo-npm 8.90s 0.20s 8.67s 9.81s 1016M 120.3K
utoo 9.50s 0.69s 8.37s 9.74s 1002M 123.3K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 28.0K 16.7K 1.19G 8M 1.86G 1.75G 1M
utoo-next 140.8K 96.6K 1.16G 6M 1.71G 1.70G 2M
utoo-npm 155.7K 106.3K 1.16G 6M 1.71G 1.70G 2M
utoo 155.1K 83.7K 1.16G 6M 1.71G 1.70G 2M

p1_resolve

PM wall ±σ user sys RSS pgMinor
bun 2.71s 0.03s 3.06s 0.95s 492M 184.0K
utoo-next 4.19s 1.88s 4.16s 1.66s 596M 91.9K
utoo-npm 3.13s 0.05s 4.28s 1.98s 611M 85.3K
utoo 3.06s 0.05s 4.17s 1.66s 618M 87.6K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 19.5K 2.2K 202M 3M 107M - 1M
utoo-next 65.3K 72.2K 200M 3M 7M 3M 2M
utoo-npm 93.7K 98.2K 200M 3M 7M 3M 2M
utoo 65.6K 69.2K 200M 3M 7M 3M 2M

p3_cold_install

PM wall ±σ user sys RSS pgMinor
bun 6.99s 0.45s 4.88s 7.47s 588M 208.3K
utoo-next 7.54s 0.91s 4.22s 8.48s 469M 58.4K
utoo-npm 8.72s 2.21s 4.29s 8.52s 458M 57.0K
utoo 8.37s 2.14s 4.22s 8.37s 485M 62.5K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 15.3K 7.1K 1018M 5M 1.76G 1.76G 1M
utoo-next 128.8K 50.9K 990M 3M 1.70G 1.70G 2M
utoo-npm 142.7K 47.4K 990M 4M 1.70G 1.70G 2M
utoo 122.0K 46.8K 990M 4M 1.70G 1.70G 2M

p4_warm_link

PM wall ±σ user sys RSS pgMinor
bun 3.52s 0.17s 0.17s 1.90s 136M 32.5K
utoo-next 2.61s 0.48s 0.40s 2.85s 79M 18.1K
utoo-npm 3.50s 0.21s 0.40s 2.83s 78M 18.7K
utoo 2.52s 0.36s 0.35s 2.73s 75M 17.4K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 367 28 5M 44K 1.91G 1.75G 1M
utoo-next 41.9K 16.9K 6K 13K 1.70G 1.70G 2M
utoo-npm 38.9K 16.7K 3K 19K 1.70G 1.70G 2M
utoo 25.6K 10.3K 6K 9K 1.71G 1.70G 2M

npmmirror.com: no output captured.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-Pkg Manager Area: Package Manager benchmark Run pm-bench on PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant