Skip to content

perf(pm): source resolver mainloop architecture#3028

Draft
elrrrrrrr wants to merge 20 commits into
nextfrom
perf/pm-source-resolver
Draft

perf(pm): source resolver mainloop architecture#3028
elrrrrrrr wants to merge 20 commits into
nextfrom
perf/pm-source-resolver

Conversation

@elrrrrrrr
Copy link
Copy Markdown
Contributor

@elrrrrrrr elrrrrrrr commented May 21, 2026

Summary

Draft source PR for the resolver half of #2948. This is not the final review unit; it is the source branch used to verify the full resolver stack in one place.

This source branch now matches the current resolver split-stack top (perf/pm-split-resolver-doc-cleanup), so the split PRs should compose back to this diff exactly.

Covers From #2948

  • ManifestProvider / ManifestJob boundary.
  • Resolver-owned BFS/cache/inflight/waiter state.
  • Provider jobs only perform concrete manifest work.
  • Demand-first prefetch scheduling.
  • Obsolete preload removal.
  • Speculative full-manifest version extraction.
  • Version manifest Vec hot path.
  • Ruborist global MemoryCache / PackageCache / registry OnceMap cleanup.
  • PM resolver wiring needed to preserve p1 behavior.

Actual Split Stack

Notes

#3043 is the remaining core scheduling PR. It is intentionally larger than the target review size because splitting semver and full-manifest demand paths creates a bad intermediate state: cache ownership and project-cache output would diverge by registry capability. Follow-up PRs keep tests, preload deletion, registry cleanup, and docs separate.

Validation

Validated at resolver split-stack top:

  • cargo fmt
  • cargo check -p utoo-ruborist
  • cargo test -p utoo-ruborist --lib (170 passed)
  • cargo clippy --all-targets -- -D warnings --no-deps

pack-napi warns locally because next.js is a symlink in this worktree; clippy exits successfully.

@elrrrrrrr elrrrrrrr added A-Pkg Manager Area: Package Manager benchmark Run pm-bench on PR labels May 21, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the dependency resolver from a two-phase approach to a demand-driven BFS resolution. It introduces a ManifestProvider trait and a centralized FetchQueues system to manage concurrent manifest jobs, moving in-memory caching directly into the resolver loop. Performance is further optimized by introducing multiple HTTP client pools and improving JSON parsing efficiency through speculative extraction. Feedback focuses on performance improvements within the new resolver logic, specifically addressing algorithmic complexity in job selection and prefetch tracking, as well as redundant cloning during cache warming. Additionally, it is recommended to replace recovery logic for unreachable states with panics to align with unrecoverable error guidelines.

Comment on lines +854 to +858
let prefetch_concurrency = if self
.queued
.values()
.any(|priority| *priority == FetchPriority::Demand)
{
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The use of self.queued.values().any(...) here is $O(N)$ where $N$ is the number of queued jobs. Since pop_next is called in a loop during BFS traversal, this can lead to $O(N^2)$ complexity relative to the level size.

Additionally, the variable name prefetch_concurrency shadows the function parameter, which can be confusing.

Consider using !self.demand.is_empty() as a fast $O(1)$ check. While it may occasionally be a false positive due to stale entries in the VecDeque, it is a safe approximation for limiting prefetch concurrency.

        let limit = if !self.demand.is_empty() {
            prefetch_concurrency
        } else {
            usize::MAX
        };

Comment on lines +1685 to +1699
let Some(done) = fetches.next().await else {
let mut fallback = Vec::new();
for (_, waiters) in state.full_waiters.drain() {
fallback.extend(waiters);
}
for (_, waiters) in state.version_waiters.drain() {
fallback.extend(waiters);
}
for (parent, edge) in fallback {
let processed = process_dependency(graph, registry, parent, &edge, config)
.await
.map_err(|inner| chain_err(graph, parent, &edge, inner))?;
handle_processed(graph, receiver, parent, &edge, &processed, &mut next_level);
}
break;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This fallback logic implements recovery for a state that should be unreachable. According to our guidelines, we should not implement recovery logic for bugs that represent unrecoverable states. Instead of a sequential fallback, this should trigger a panic to ensure the underlying logic error is identified and fixed.

        if result.is_none() && !waiters.is_empty() {
            panic!("Async fetcher returned None while waiters still exist. This is an unrecoverable logic error.");
        }
References
  1. Do not implement recovery logic for panics. Panics should be treated as unrecoverable bugs that need to be fixed, not as transient, recoverable errors.

Comment on lines +887 to +892
fn active_prefetches(&self) -> usize {
self.active
.values()
.filter(|priority| **priority == FetchPriority::Prefetch)
.count()
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The active_prefetches method performs an $O(M)$ scan of the active jobs map (where $M$ is the concurrency limit). Since this is called frequently in the main loop, it's better to maintain a counter in the FetchQueues struct.

Comment on lines +917 to +931
for (name, pkg_cache) in &warm.cache {
for (spec, version) in &pkg_cache.specs {
let Some(manifest) = pkg_cache.manifests.get(version) else {
continue;
};
let manifest = Arc::new(manifest.clone());
state
.version_cache
.insert((name.clone(), spec.clone()), Arc::clone(&manifest));
state
.version_cache
.entry((name.clone(), version.clone()))
.or_insert(manifest);
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This implementation clones the manifest for every spec that points to it, which is inefficient for packages with many specs resolving to the same version.

Consider iterating over the manifests first to populate the version cache, then mapping specs to the already created Arcs.

        for (name, pkg_cache) in &warm.cache {
            let mut version_to_arc = HashMap::new();
            for (version, manifest) in &pkg_cache.manifests {
                let arc = Arc::new(manifest.clone());
                version_to_arc.insert(version, Arc::clone(&arc));
                state.version_cache.insert((name.clone(), version.clone()), arc);
            }
            for (spec, version) in &pkg_cache.specs {
                if let Some(arc) = version_to_arc.get(version) {
                    state.version_cache.insert((name.clone(), spec.clone()), Arc::clone(arc));
                }
            }
        }

@elrrrrrrr elrrrrrrr force-pushed the perf/pm-source-resolver branch from 2702d34 to e0acfeb Compare May 21, 2026 17:40
@elrrrrrrr elrrrrrrr force-pushed the perf/pm-source-resolver branch from 9b50ad8 to 9a0819b Compare May 21, 2026 22:32
@elrrrrrrr elrrrrrrr force-pushed the perf/pm-source-resolver branch from 9a0819b to bb23373 Compare May 21, 2026 23:09
@elrrrrrrr elrrrrrrr force-pushed the perf/pm-source-resolver branch from bb23373 to 711835d Compare May 21, 2026 23:39
@github-actions
Copy link
Copy Markdown

📊 pm-bench-phases · 705fbfd · linux (ubuntu-latest)

Workflow run — ant-design

PMs: utoo (this branch) · utoo-npm (latest published) · bun (latest)

npmjs.org

p0_full_cold

PM wall ±σ user sys RSS pgMinor
bun 9.21s 0.22s 10.36s 10.20s 629M 304.8K
utoo-next 8.96s 1.14s 10.49s 12.52s 1.02G 125.9K
utoo-npm 10.22s 2.33s 10.78s 12.87s 1017M 118.9K
utoo 10.04s 1.78s 11.48s 12.81s 958M 153.2K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 16.2K 18.7K 1.19G 7M 1.86G 1.74G 1M
utoo-next 131.6K 93.5K 1.16G 5M 1.71G 1.70G 2M
utoo-npm 157.3K 99.9K 1.16G 6M 1.71G 1.70G 2M
utoo 139.8K 66.8K 1.16G 7M 1.71G 1.70G 3M

p1_resolve

PM wall ±σ user sys RSS pgMinor
bun 2.36s 0.34s 3.91s 1.09s 509M 182.5K
utoo-next 3.06s 0.01s 5.37s 1.78s 614M 93.7K
utoo-npm 3.22s 0.15s 5.44s 2.15s 597M 79.5K
utoo 2.49s 0.04s 5.92s 1.74s 642M 123.6K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 11.2K 3.7K 203M 3M 108M - 1M
utoo-next 51.9K 72.6K 200M 2M 7M 3M 2M
utoo-npm 75.9K 90.5K 200M 2M 7M 3M 2M
utoo 16.3K 21.5K 202M 3M 7M 3M 2M

p3_cold_install

PM wall ±σ user sys RSS pgMinor
bun 6.82s 0.19s 6.27s 10.03s 593M 206.4K
utoo-next 10.20s 0.53s 5.08s 11.69s 499M 61.0K
utoo-npm 9.09s 2.92s 5.18s 11.56s 496M 63.2K
utoo 10.03s 0.93s 5.15s 11.69s 516M 67.0K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 5.0K 8.0K 1019M 4M 1.76G 1.76G 1M
utoo-next 151.5K 50.6K 989M 4M 1.70G 1.70G 2M
utoo-npm 136.5K 50.1K 989M 4M 1.70G 1.70G 2M
utoo 152.4K 54.4K 989M 4M 1.70G 1.70G 2M

p4_warm_link

PM wall ±σ user sys RSS pgMinor
bun 3.44s 0.04s 0.19s 2.35s 135M 32.4K
utoo-next 2.26s 0.13s 0.49s 3.79s 80M 18.7K
utoo-npm 2.23s 0.11s 0.48s 3.73s 80M 18.1K
utoo 2.28s 0.08s 0.48s 3.77s 80M 18.5K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 297 22 5M 44K 1.91G 1.73G 1M
utoo-next 42.4K 18.8K 307K 13K 1.70G 1.70G 2M
utoo-npm 40.5K 18.1K 307K 11K 1.70G 1.70G 2M
utoo 41.3K 19.9K 306K 23K 1.71G 1.70G 2M

npmmirror.com: no output captured.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-Pkg Manager Area: Package Manager benchmark Run pm-bench on PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant