Prevent race conditions when downloading NodeJS#424
Conversation
Singularity downloads the NodeJS image locally (either in the current folder or in the `CWL_SINGULARITY_CACHE` path. If multiple processes download the same image concurrently, the second one fails because the image already exists. This PR prevents this behaviour by letting the execution proceed when an image is already there.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #424 +/- ##
==========================================
- Coverage 37.77% 37.77% -0.01%
==========================================
Files 50 50
Lines 36760 36764 +4
Branches 9531 9532 +1
==========================================
+ Hits 13886 13887 +1
- Misses 19941 19944 +3
Partials 2933 2933 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@kinow this should fix the race condition you encountered with NodeJS and Singularity |
|
Will try to test it today, @GlassOfWhiskey . Thank you! |
kinow
left a comment
There was a problem hiding this comment.
Used the PR in StreamFlow, alpha-unito/streamflow#1054:
(streamflow) [curso348@login209-18 streamflow]$ git log -n 1
commit bec7ab63f5d13f389e2da71f4ba54aa76611ccbf (HEAD -> node-with-singularity, upstream/node-with-singularity)
Author: GlassOfWhiskey <iacopo.c92@gmail.com>
Date: Tue May 5 00:00:40 2026 +0200
Fix NodeJS retrieval when Docker is unavailable
Fix #1053 by adding `_get_container_engine()` to detect the available
container runtime by probing `docker`, `singularity`, `podman`, and
`udocker` in order, caching the result with `@cached(FIFOCache(1))`
from `cachebox`. The detected engine is passed as `container_engine`
to `cwl_utils.expression.interpolate()` in `eval_expression()`,
fixing CWL JavaScript expression evaluation on systems without Docker.
Then cloned this repository, and checked out this branch.
commit d28292033704f7eac76007ffa45f1bffd74a829e (HEAD -> singularity-nodejs-race-conditions, upstream/singularity-nodejs-race-conditions)
Author: GlassOfWhiskey <iacopo.c92@gmail.com>
Date: Mon May 18 11:21:46 2026 +0200
Prevent race conditions when downloading NodeJS
Singularity downloads the NodeJS image locally (either in the
current folder or in the `CWL_SINGULARITY_CACHE` path. If multiple
processes download the same image concurrently, the second one
fails because the image already exists. This PR prevents this
behaviour by letting the execution proceed when an image is already
there.And installed it,
Building wheels for collected packages: cwl-utils
Building editable for cwl-utils (pyproject.toml) ... done
Created wheel for cwl-utils: filename=cwl_utils-0.41-py3-none-any.whl size=8292 sha256=1be7ae4fdd6cd660236197be97bbc64d6c5bb50eeb1406dccf7372d93437fe67
Stored in directory: /tmp/pip-ephem-wheel-cache-tr085one/wheels/5f/bb/dd/2f3af69a948fdf1c19aa2b6ae3fe6bd2ea8e2f1572bba84763
Successfully built cwl-utils
Installing collected packages: cwl-utils
Attempting uninstall: cwl-utils
Found existing installation: cwl-utils 0.41
Uninstalling cwl-utils-0.41:
Successfully uninstalled cwl-utils-0.41
Successfully installed cwl-utils-0.41Entered the realpath with cd $(pwd -P) (due to the bug with symlinks, common-workflow-language/cwltest#281), and tried to run it.
======================================================================================================================= 26 failed, 170 passed, 1 skipped in 31.23s =======================================================================================================================@GlassOfWhiskey maybe we can troubleshoot it later. Maybe I'm doing wrong as I was launching this while waiting for other jobs to complete in Slurm. I will try again.
|
But you know why did they fail? Without logs it is difficult to understand if fails are related to the PR or not :( |
|
They failed in the even loop of streamflow. I tried to run a workflow directly and it crashed with the same error. I tried installing node, but the same happened. I suspect it is more the lack of sleep, which is why I didn' want to bother you with the logs 😬 Let me try from scratch installing everything again. |
kinow
left a comment
There was a problem hiding this comment.
Hi,
I cloned a fresh copy of StreamFlow, checked out 0.2.0rc2, and created a new venv with Python 3.13 and then pip install -e . and pip install --group test.
I am in a login node with Internet, but I do not have NodeJS:
(venv) [<USER>@glogin4 streamflow]$ streamflow version
StreamFlow version 0.2.0rc2
(venv) [<USER>@glogin4 streamflow]$ cwltest --version
/gpfs/projects/bsc/user/cwl/streamflow/venv/bin/cwltest 2.6.20251216093331
(venv) [<USER>@glogin4 streamflow]$ node
-bash: node: command not foundI am in a path that is not a symlink:
(venv) [<USER>@glogin4 streamflow]$ pwd -P
/gpfs/projects/bsc/<USER>/cwl/streamflow
(venv) [<USER>@glogin4 streamflow]$ pwd
/gpfs/projects/bsc/<USER>/cwl/streamflow
(venv) [<USER>@glogin4 streamflow]$ realpath -P .
/gpfs/projects/bsc/<USER>cwl/streamflow
I loaded the Singularity module,
(venv) [<USER>@glogin4 streamflow]$ module load singularity/4.1.5
load SINGULARITY/4.1.5 (PATH)
(venv) [<USER>@glogin4 streamflow]$ singularity --version
singularity-ce version 4.1.5And I also set the number of xdist workers to 8 (instead of auto = 112 processors from the login node).
$ sed -i 's/-n auto/-n 8/' cwl-conformance-test.shStreamFlow 0.2.0rc2
I think my pip list is not very important as I think the cwl-conformance-tests.sh
will create a separate venv and install new dependenccies there. But here's what I
have when calling the command FWIW.
pip-list-0.2.0rc2.log
Package Version Editable project location
------------------------- ------------------ ---------------------------------------------
aiohappyeyeballs 2.6.2
aiohttp 3.13.5
aiosignal 1.4.0
aiosqlite 0.22.1
antlr4-python3-runtime 4.13.2
argcomplete 3.6.3
asyncssh 2.22.0
attrs 26.1.0
bcrypt 5.0.0
cachebox 5.2.3
CacheControl 0.14.4
certifi 2026.5.20
cffi 2.0.0
charset-normalizer 3.4.7
coloredlogs 15.0.1
coverage 7.14.1
cryptography 48.0.0
cwl-upgrader 1.2.15
cwl-utils 0.41
cwltest 2.6.20251216093331
cwltool 3.2.20260413085819
defusedxml 0.7.1
execnet 2.1.2
filelock 3.29.0
frozenlist 1.8.0
humanfriendly 10.0
idna 3.17
importlib_metadata 9.0.0
iniconfig 2.3.0
Jinja2 3.1.6
jsonschema 4.26.0
jsonschema-specifications 2025.9.1
junit-xml 1.9
kubernetes_asyncio 35.0.1
lxml 6.1.1
markdown-it-py 4.2.0
MarkupSafe 3.0.3
mdurl 0.1.2
mistune 3.2.1
msgpack 1.1.2
mslex 1.3.0
multidict 6.7.1
mypy_extensions 1.1.0
networkx 3.6.1
packaging 26.2
pip 26.1.1
pluggy 1.6.0
propcache 0.5.2
prov 1.5.1
psutil 7.2.2
pycparser 3.0
pydot 4.0.1
Pygments 2.20.0
pyparsing 3.3.2
pytest 9.0.3
pytest-asyncio 1.3.0
pytest-cov 7.1.0
pytest-xdist 3.8.0
python-dateutil 2.9.0.post0
PyYAML 6.0.3
rdflib 7.6.0
referencing 0.37.0
requests 2.34.2
rich 15.0.0
rich-argparse 1.8.0
rpds-py 2026.5.1
ruamel.yaml 0.19.1
schema-salad 8.9.20260417192335
six 1.17.0
spython 0.3.14
streamflow 0.2.0rc2 /gpfs/projects/bsc/<USER>/cwl/streamflow
typing_extensions 4.15.0
urllib3 2.7.0
yarl 1.24.2
yattag 1.16.1
zipp 4.1.0I launched the CWL conformance tests for the CWL spec v1.0 using the commit used in StreamFlow's GH Action pipeline.
VERSION=v1.0 COMMIT="a062055fddcc7d7d9dbc53d28288e3ccb9a800d8" EXCLUDE="docker_entrypoint" DOCKER="singularity" bash cwl-conformance-test.sh > run-1.0.2.log 2>&1It failed complaining about NodeJS missing (really long output, so I'm only showing a small
part of the log.
--2026-05-30 23:08:08-- https://github.com/common-workflow-language/common-workflow-language/archive/a062055fddcc7d7d9dbc53d28288e3ccb9a800d8.tar.gz
Resolving github.com (github.com)... 140.82.121.4
Connecting to github.com (github.com)|140.82.121.4|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://codeload.github.com/common-workflow-language/common-workflow-language/tar.gz/a062055fddcc7d7d9dbc53d28288e3ccb9a800d8 [following]
--2026-05-30 23:08:09-- https://codeload.github.com/common-workflow-language/common-workflow-language/tar.gz/a062055fddcc7d7d9dbc53d28288e3ccb9a800d8
Resolving codeload.github.com (codeload.github.com)... 140.82.121.10
Connecting to codeload.github.com (codeload.github.com)|140.82.121.10|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/x-gzip]
Saving to: 'a062055fddcc7d7d9dbc53d28288e3ccb9a800d8.tar.gz'
0K .......... .......... .......... .......... .......... 284K
50K .......... .......... .......... .......... .......... 568K
100K .......... .......... .......... .......... .......... 56.3M
150K .......... .......... .......... .......... .......... 1.15M
200K .......... .......... .......... .......... .......... 1.09M
250K .......... .......... .......... .......... .......... 50.7M
300K .......... .......... .......... .......... .......... 155M
350K .......... .......... .......... .......... .......... 127M
400K .......... .......... .......... .......... .......... 580K
450K .......... .......... .......... .......... .......... 129M
500K .......... .......... .......... .......... .......... 105M
550K .......... .......... .......... .......... .......... 136M
600K .......... .......... .......... .......... .......... 157M
650K .......... .......... .......... .......... .......... 1.24M
700K .......... .......... .......... .......... .......... 153M
750K .......... .......... .......... .......... .......... 165M
800K .......... .......... .......... .......... .......... 1.12M
850K .......... .......... .......... .......... .......... 15.6M
900K .......... .......... .......... .......... .......... 85.9M
950K .......... .......... .......... .......... .......... 125M
1000K .......... .......... .......... .......... .......... 123M
1050K .......... .......... .......... .......... .......... 124M
1100K .......... .......... .......... .......... .......... 124M
1150K .......... .......... .......... .......... .......... 126M
1200K .......... .......... .......... .......... .......... 115M
1250K .......... .......... .......... .......... .......... 120M
1300K .......... .......... .......... .......... .......... 120M
1350K .......... .......... .......... .......... .......... 148M
1400K .......... .......... .......... .......... .......... 163M
1450K .......... .......... .......... .......... .......... 134M
1500K .......... .......... .......... .......... .......... 155M
1550K .......... .......... .......... .......... .......... 1.36M
1600K .......... ...... 123M=0.6s
2026-05-30 23:08:10 (2.77 MB/s) - 'a062055fddcc7d7d9dbc53d28288e3ccb9a800d8.tar.gz' saved [1654801]
Using CPython 3.14.4
Creating virtual environment at: cwl-conformance-venv
Activate with: source cwl-conformance-venv/bin/activate
Resolved 135 packages in 1ms
Building streamflow @ file:///gpfs/projects/bsc/<USER>/cwl/streamflow
Built streamflow @ file:///gpfs/projects/bsc/<USER>/cwl/streamflow
Prepared 1 package in 820ms
warning: Failed to hardlink files; falling back to full copy. This may lead to degraded performance.
If the cache and target directories are on different filesystems, hardlinking may not be supported.
If this is intentional, set `export UV_LINK_MODE=copy` or use `--link-mode=copy` to suppress this warning.
Installed 51 packages in 906ms
+ aiohappyeyeballs==2.6.1
+ aiohttp==3.13.5
+ aiosignal==1.4.0
+ aiosqlite==0.22.1
+ antlr4-python3-runtime==4.13.2
+ asyncssh==2.22.0
+ attrs==25.4.0
+ bcrypt==5.0.0
+ cachebox==5.2.3
+ cachecontrol==0.14.4
+ certifi==2025.11.12
+ cffi==2.0.0
+ charset-normalizer==3.4.4
+ cryptography==46.0.7
+ cwl-upgrader==1.2.12
+ cwl-utils==0.41
+ filelock==3.20.3
+ frozenlist==1.8.0
+ idna==3.11
+ importlib-metadata==9.0.0
+ jinja2==3.1.6
+ jsonschema==4.26.0
+ jsonschema-specifications==2025.9.1
+ kubernetes-asyncio==35.0.1
+ markupsafe==3.0.3
+ mistune==3.1.4
+ msgpack==1.1.2
+ mslex==1.3.0
+ multidict==6.7.0
+ mypy-extensions==1.1.0
+ packaging==25.0
+ propcache==0.4.1
+ psutil==7.2.2
+ pycparser==2.23
+ pyparsing==3.2.5
+ python-dateutil==2.9.0.post0
+ pyyaml==6.0.3
+ rdflib==7.6.0
+ referencing==0.37.0
+ requests==2.33.0
+ rpds-py==0.30.0
+ ruamel-yaml==0.18.16
+ schema-salad==8.9.20251102115403
+ setuptools==80.9.0
+ six==1.17.0
+ streamflow==0.2.0rc2 (from file:///gpfs/projects/bsc/<USER>/cwl/streamflow)
+ typing-extensions==4.15.0
+ urllib3==2.6.3
+ yarl==1.22.0
+ yattag==1.16.1
+ zipp==3.23.0
Using Python 3.14.4 environment at: cwl-conformance-venv
Resolved 4 packages in 79ms
Prepared 4 packages in 0.62ms
Uninstalled 2 packages in 35ms
warning: Failed to hardlink files; falling back to full copy. This may lead to degraded performance.
If the cache and target directories are on different filesystems, hardlinking may not be supported.
If this is intentional, set `export UV_LINK_MODE=copy` or use `--link-mode=copy` to suppress this warning.
Installed 4 packages in 444ms
- packaging==25.0
+ packaging==26.2
+ pip==26.1.1
- setuptools==80.9.0
+ setuptools==82.0.1
+ wheel==0.47.0
Using Python 3.14.4 environment at: cwl-conformance-venv
Resolved 51 packages in 22ms
Building streamflow @ file:///gpfs/projects/bsc/<USER>/cwl/streamflow
Built streamflow @ file:///gpfs/projects/bsc/<USER>/cwl/streamflow
Prepared 1 package in 919ms
Uninstalled 1 package in 1ms
warning: Failed to hardlink files; falling back to full copy. This may lead to degraded performance.
If the cache and target directories are on different filesystems, hardlinking may not be supported.
If this is intentional, set `export UV_LINK_MODE=copy` or use `--link-mode=copy` to suppress this warning.
Installed 1 package in 60ms
~ streamflow==0.2.0rc2 (from file:///gpfs/projects/bsc/<USER>/cwl/streamflow)
Using Python 3.14.4 environment at: cwl-conformance-venv
Resolved 76 packages in 22ms
Building streamflow @ file:///gpfs/projects/bsc/<USER>/cwl/streamflow
Built streamflow @ file:///gpfs/projects/bsc/<USER>/cwl/streamflow
Prepared 1 package in 886ms
Uninstalled 1 package in 11ms
warning: Failed to hardlink files; falling back to full copy. This may lead to degraded performance.
If the cache and target directories are on different filesystems, hardlinking may not be supported.
If this is intentional, set `export UV_LINK_MODE=copy` or use `--link-mode=copy` to suppress this warning.
Installed 26 packages in 590ms
+ argcomplete==3.6.3
+ coloredlogs==15.0.1
+ coverage==7.14.1
+ cwltest==2.6.20251216093331
+ cwltool==3.2.20260413085819
+ defusedxml==0.7.1
+ execnet==2.1.2
+ humanfriendly==10.0
+ iniconfig==2.3.0
+ junit-xml==1.9
+ lxml==6.1.1
+ markdown-it-py==4.2.0
+ mdurl==0.1.2
+ networkx==3.6.1
+ pluggy==1.6.0
+ prov==1.5.1
+ pydot==4.0.1
+ pygments==2.20.0
+ pytest==9.0.3
+ pytest-asyncio==1.3.0
+ pytest-cov==7.1.0
+ pytest-xdist==3.8.0
+ rich==15.0.0
+ rich-argparse==1.8.0
+ spython==0.3.14
~ streamflow==0.2.0rc2 (from file:///gpfs/projects/bsc/<USER>/cwl/streamflow)
============================= test session starts ==============================
platform linux -- Python 3.14.4, pytest-9.0.3, pluggy-1.6.0
rootdir: /gpfs/projects/bsc/<USER>/cwl/streamflow
configfile: tox.ini
plugins: cwltest-2.6.20251216093331, asyncio-1.3.0, xdist-3.8.0, cov-7.1.0
asyncio: mode=Mode.STRICT, debug=False, asyncio_default_fixture_loop_scope=session, asyncio_default_test_loop_scope=session
created: 112/112 workers
112 workers [197 items]
...F.FF.FF.FF..F....F.......s.F........F..F...FF...F.F.FFF....FFFF.F.F.. [ 36%]
FFF.F.FFF.FF.FFF..........FFFFFFFF.FFF..FFFFFFFFF....FF......F.F..F.F.F. [ 73%]
F..F........................F.......F..FFFFF.F.....F. [ 99%]
=================================== FAILURES ===================================
_______________________ cwl test: exprtool_file_literal ________________________
[gw101] linux -- Python 3.14.4 /gpfs/projects/bsc/<USER>/cwl/streamflow/cwl-conformance-venv/bin/python
CWL test execution failed.
Returned non-zero but it should be zero
Test: job:
file:///gpfs/projects/bsc/<USER>/cwl/streamflow/common-workflow-language-a062055fddcc7d7d9dbc53d28288e3ccb9a800d8/v1.0/v1.0/empty.json
output:
lit:
location: a_file
class: File
checksum: sha1$fea23663b9c8ed71968f86415b5ec091bb111448
size: 19
tool:
file:///gpfs/projects/bsc/<USER>/cwl/streamflow/common-workflow-language-a062055fddcc7d7d9dbc53d28288e3ccb9a800d8/v1.0/v1.0/file-literal-ex.cwl
label: exprtool_file_literal
id: 102
doc: Test file literal output created by ExpressionTool
tags:
- inline_javascript
- expression_tool
line: '1397'
----------------------------- Captured stderr call -----------------------------
2026-05-30 23:08:43.711 INFO Processing workflow 906701bf-2645-4c4c-8dbd-e58970569be9
2026-05-30 23:08:43.778 INFO Building workflow execution plan
2026-05-30 23:08:44.088 INFO COMPLETED building of workflow execution plan
2026-05-30 23:08:44.088 INFO EXECUTING workflow 906701bf-2645-4c4c-8dbd-e58970569be9
2026-05-30 23:08:44.402 INFO Evaluating expression for step / (job /0)
2026-05-30 23:08:44.415 ERROR NodeJSEngine requires Node.js engine to evaluate and validate Javascript expressions, but couldn't find it. Tried nodejs, node, docker run node:alpine
Traceback (most recent call last):
File "/gpfs/projects/bsc/<USER>/cwl/streamflow/streamflow/core/recovery.py", line 42, in wrapper
await func(*args, **kwargs)
File "/gpfs/projects/bsc/<USER>/cwl/streamflow/streamflow/workflow/step.py", line 736, in _execute_command
command_output := await command_task
^^^^^^^^^^^^^^^^^^
File "/gpfs/projects/bsc/<USER>/cwl/streamflow/streamflow/cwl/command.py", line 1304, in execute
result = utils.eval_expression(
expression=self.expression,
...<2 lines>...
expression_lib=self.expression_lib,
)
File "/gpfs/projects/bsc/<USER>/cwl/streamflow/streamflow/cwl/utils.py", line 732, in eval_expression
cwl_utils.expression.interpolate(
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
expression,
^^^^^^^^^^^
...<8 lines>...
timeout=timeout,
^^^^^^^^^^^^^^^^
)
^
File "/gpfs/projects/bsc/<USER>/cwl/streamflow/cwl-conformance-venv/lib/python3.14/site-packages/cwl_utils/expression.py", line 227, in interpolate
e = evaluator(
js_engine, scan[w[0] + 1 : w[1]], rootvars, jslib, fullJS, **kwargs
)
File "/gpfs/projects/bsc/<USER>/cwl/streamflow/cwl-conformance-venv/lib/python3.14/site-packages/cwl_utils/expression.py", line 173, in evaluator
return cast(CWLOutputType, js_engine.eval(ex, jslib, **kwargs))
~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^
File "/gpfs/projects/bsc/<USER>/cwl/streamflow/cwl-conformance-venv/lib/python3.14/site-packages/cwl_utils/sandboxjs.py", line 473, in eval
returncode, stdout, stderr = self.exec_js_process(
~~~~~~~~~~~~~~~~~~~~^
fn,
^^^
...<3 lines>...
container_engine=container_engine,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "/gpfs/projects/bsc/<USER>/cwl/streamflow/cwl-conformance-venv/lib/python3.14/site-packages/cwl_utils/sandboxjs.py", line 201, in exec_js_process
new_proc = self.new_js_proc(
js_engine_code,
force_docker_pull=force_docker_pull,
container_engine=container_engine,
)
File "/gpfs/projects/bsc/<USER>/cwl/streamflow/cwl-conformance-venv/lib/python3.14/site-packages/cwl_utils/sandboxjs.py", line 442, in new_js_proc
raise JavascriptException(
...<5 lines>...
)
cwl_utils.errors.JavascriptException: NodeJSEngine requires Node.js engine to evaluate and validate Javascript expressions, but couldn't find it. Tried nodejs, node, docker run node:alpine
2026-05-30 23:08:44.427 WARNING Job /0 failure can not be recovered. Failure manager is not enabled.
2026-05-30 23:08:44.427 ERROR NodeJSEngine requires Node.js engine to evaluate and validate Javascript expressions, but couldn't find it. Tried nodejs, node, docker run node:alpine
2026-05-30 23:08:44.442 INFO FAILED Step /
2026-05-30 23:08:44.454 ERROR FAILED Workflow execution
...
...
I modified the version of cwl-utils to test this PR (see screenshot above), and launched it again. When the output prints the pip list info, it shows:
Installed 1 package in 195ms
- cwl-utils==0.41
+ cwl-utils==0.41 (from git+https://github.com/common-workflow-language/cwl-utils.git@d28292033704f7eac76007ffa45f1bffd74a829e)If I load the micromamba env that contains NodeJS, then everything works:
===================================================== 196 passed, 1 skipped in 14.70s =====================================================
@GlassOfWhiskey somehow I am not able to get StreamFlow + cwl-utils to fetch containers? Re-reading our conversation in Element, I see I had this same error of NodeJS not found on MareNostrum, and we thought it could be a race condition. I think this PR has value in being merged as the situation you described might really happen.
But I think there's another separate issue that's causing the process to run on MN5 but fail to identify node is missing and that it should use singularity to fetch it.
I commented out the last part of the cwl-conformance-tests.sh script to leave the temporary directory behind (i.e., I commented that rm line), and inspected the streamflow.yml in the tests dir, and I can confirm it's using the singularity option.
Sorry the delay in replying here.
Singularity downloads the NodeJS image locally (either in the current folder or in the
CWL_SINGULARITY_CACHEpath. If multiple processes download the same image concurrently, the second one fails because the image already exists. This PR prevents this behaviour by letting the execution proceed when an image is already there.