Skip to content

fix: use explicit spawn context for all multiprocessing primitives#691

Merged
william-silversmith merged 2 commits into
masterfrom
dodam/mp_context
Apr 23, 2026
Merged

fix: use explicit spawn context for all multiprocessing primitives#691
william-silversmith merged 2 commits into
masterfrom
dodam/mp_context

Conversation

@dodamih
Copy link
Copy Markdown
Contributor

@dodamih dodamih commented Mar 31, 2026

Queue, Lock, and ProcessPoolExecutor were using different mp contexts. mp.Queue() and mp.Lock() used the default context (fork on Linux), while ProcessPoolExecutor used an explicit spawn context. Passing fork-context primitives to spawn-context workers raises RuntimeError.

This worked inconsistently because cloudfiles calls multiprocessing.set_start_method("spawn", force=True) as a side effect during its first download. If that ran before parallel_execution, the default context was already spawn and it matched. Otherwise it crashed.

Fix: create all primitives from the same explicit spawn context.

Note the following subtlety:

If a CV has a mesh directory, during CloudVolume.__getitem__, it calls PrecomputedMeshMetadata.fetch_info() → cache.download_json() → cloudfiles.get() → cloudfiles.parallel_execute → set_start_method("spawn", force=True). This happens before parallel_execution runs - so it works fine; it only manifests for volumes without meshes.

Also, CloudFiles really should not force change the global start_method for a download.

Queue, Lock, and ProcessPoolExecutor were using different mp contexts.
mp.Queue() and mp.Lock() used the default context (fork on Linux),
while ProcessPoolExecutor used an explicit spawn context. Passing
fork-context primitives to spawn-context workers raises RuntimeError.

This worked inconsistently because cloudfiles calls
multiprocessing.set_start_method("spawn", force=True) as a side effect
during its first download. If that ran before parallel_execution, the
default context was already spawn and it matched. Otherwise it crashed.

Fix: create all primitives from the same explicit spawn context.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@william-silversmith
Copy link
Copy Markdown
Contributor

Talking with Nico about this in https://github.com/seung-lab/python-task-queue/pull/47/changes

@william-silversmith william-silversmith added bug The code is not performing according to the design or a design flaw is seriously impacting users. labels Apr 16, 2026
@dodamih
Copy link
Copy Markdown
Contributor Author

dodamih commented Apr 17, 2026

Thanks - hopefully whatever you end up doing there can also be applied here.

@dodamih
Copy link
Copy Markdown
Contributor Author

dodamih commented Apr 17, 2026

Also, you already have a similar fix in #659 in wms_refactor_shared_memory branch.

@william-silversmith
Copy link
Copy Markdown
Contributor

william-silversmith commented Apr 23, 2026

Good point! Why not ctx.cpu_count() too then? I feel more confident about this then since I tested it myself. 😁 The issue with that other PR wasn't the multiprocessing (after I fixed it) it was some warning message I couldn't get rid of related to shared memory.

Per review feedback, convert remaining mp.* calls in parallel_execution
to use the spawn context.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@william-silversmith william-silversmith merged commit 1514976 into master Apr 23, 2026
@william-silversmith william-silversmith deleted the dodam/mp_context branch April 23, 2026 18:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug The code is not performing according to the design or a design flaw is seriously impacting users.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants