Zero-dependency CLI that rips through CPU-heavy jobs using Node.js
worker_threads.
Feed it a file list. Give it a worker script. Chain workers like Unix pipes. It saturates all your CPU cores in parallel — no config, no boilerplate, no dependencies.
- job-ripper (jori)
Benchmark scenario: process files from node_modules across three workload profiles
(CPU-bound compression, Markdown rendering, JSON schema validation).
Full results across Intel Core Ultra 7 155U and AMD EPYC 9645 — see benchmarks/README.md.
Quick numbers — brotli + pbkdf2-sha256, Intel Core Ultra 7 155U, c=10:
| Approach | Mean time | vs. single-thread |
|---|---|---|
xargs (process per file) |
32 s | 20× slower |
| Single-threaded loop | 9.4 s | baseline |
| job-ripper | 1.6 s | 6× faster |
Concurrency starting point: 75-100% of cores for CPU-bound tasks, 50-75% for mixed workloads, 15-25% for light ones. For nearly pure I/O, 1-2 workers is enough — the worker still unblocks the main thread even without parallelism.
Run your own baseline and read the full analysis in benchmarks/README.md:
# Requires hyperfine — install instructions in benchmarks/README.md
cd benchmarks
npm run bench:md-html -- -c 8✅ CPU-bound work — this is what jori is for:
- Transpiling / compiling files (TS → JS, SCSS → CSS)
- Image / video encoding and resizing
- Markdown → HTML, PDF generation
- Hash computation, encryption, compression
- JSON schema validation (large files or schemas), data transformation
- Static analysis, linting, code formatting
❌ I/O-bound work — use streams instead:
Spawning 8 workers to read 8 files simultaneously won't help if your bottleneck is disk throughput or a remote API rate limit. In those cases, plain Promise.all with a concurrency limiter (e.g. p-limit) is simpler and equally fast.
npm install -g job-ripper # global CLI
# or
npm install job-ripper # local, for programmatic use in production (see API section)
# or
npm install --save-dev job-ripper # local, for use in build scripts / dev tooling onlyRequires Node.js ≥ 22.
⚠️ Security note: Only run workers you trust. A worker script executes with full Node.js privileges and can read or modify any file the running user has access to. Review third-party code before use.
1. Write a worker (compress.mjs):
import { gzipSync } from 'node:zlib';
import { readFileSync, writeFileSync } from 'node:fs';
export default async function(filePath, _args) {
const data = readFileSync(filePath);
const compressed = gzipSync(data);
writeFileSync(filePath + '.gz', compressed);
}2. Run it:
$ jori "src/**/*.js" -w compress.mjs -c 50%
Using concurrency: 6
--- Processing Complete ---
Total files: 312
Success: 312
Failed: 0
Time: 2.41sThat's it. No config files, no require() wrappers, no callbacks.
main thread
┌───────────────┐
glob / stdin ──► │ file queue │
│ │
│ backpressure │ ◄── maxQueue limit
└──────┬────────┘
│ dispatch
┌──────────────┼──────────────┐
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│ worker 1 │ │ worker 2 │ │ worker N │ ← N = -c value (default: cpus × 0.75)
│ (your fn)│ │ (your fn)│ │ (your fn)│
└────┬─────┘ └────┬─────┘ └────┬─────┘
└──────────────┼──────────────┘
▼
result / error ──► logged to stderr + exit code
Architecture notes:
- Workers are pre-spawned once at startup (warm pool — no per-file overhead).
- The main thread reads files and dispatches tasks; it never runs user code.
- An internal queue with backpressure prevents the in-memory task list from growing unbounded on slow workers.
- Task-level errors (
throwinside your function) are counted as failures and printed to stderr; processing always continues for remaining files. By default the process exits with code1if any task failed. Pass-k/--keep-goingto exit0instead. - Fatal errors (worker crash, module not found, missing default export) halt the entire run immediately with a clear message.
Usage:
jori <glob> -w <worker> [options] [-- worker_args...]
<command> | jori -w <worker> [options] [-- worker_args...]
Arguments:
<glob> File glob pattern or path to a single file
Options:
-w, --worker <path> Path to the worker script (required)
-c, --concurrency <N> Number of workers or CPU percentage (e.g., 4 or 75%, default: 75%)
-v, --verbose Print each processed file and detailed statistics
-s, --silent Suppress all non-fatal output (including worker error messages)
-k, --keep-going Exit 0 even if some tasks failed (default: exit 1 on any failure)
--dry-run Print matched files without running workers
-h, --help Show this help message
Concurrency formats:
| Value | Meaning |
|---|---|
4 |
Exactly 4 workers |
50% |
50 % of logical CPU cores (rounded down, min 1) |
| (omitted) | Default as 75% |
jori "src/**/*.ts" -w build.mjs
jori "images/**/*.png" -w resize.mjs -c 8When no <glob> argument is given, jori reads file paths from stdin (one per line). This enables Unix-style pipelines:
find . -name "*.log" -mtime -7 | jori -w analyze.mjs
cat file-list.txt | jori -w process.mjs -c 4A worker is any ESM module that exports a default function:
/**
* @param filePath - Absolute path to the file to process.
* @param args - Extra arguments forwarded from CLI: `jori ... -- --flag value`.
* @returns Optional value forwarded to `onSuccess(filePath, result)` in the programmatic API.
*/
export default async function(filePath: string, args: string[]): Promise<unknown> {
// ...
}The type annotation above is for documentation purposes only — TypeScript is not required. A plain
.mjs/.cjsmodule works exactly the same.
The signature is async — jori correctly awaits the result, so both sync and async bodies work. However:
Prefer sync APIs inside the body. Each worker runs in its own dedicated thread — blocking it is intentional and expected.
readFileSync,gzipSync,createHashetc. avoid unnecessary Promise/microtask overhead. Reserveasyncfor cases where you genuinely need it (e.g. calling an external HTTP API).
Minimal example:
// transform.mjs
import { readFileSync, writeFileSync } from 'node:fs';
export default async function(filePath) {
const src = readFileSync(filePath, 'utf8');
writeFileSync(filePath, src.toUpperCase());
}Error handling:
| What you do | What jori does |
|---|---|
throw new Error(...) |
Counts as failed, printed to stderr by default (suppressed with --silent). By default the process exits with code 1 after all files are processed. With -k / --keep-going the run finishes normally and exits 0. |
| Return normally | Counts as success; return value is forwarded to onSuccess in the programmatic API |
| Module has no default export | Fatal error — run stops immediately with a clear message |
| Module file not found | Fatal error — run stops immediately |
Looking for more? Check out the examples/ directory in the repository for ready-to-use worker scripts and practical use cases.
After processing each file, jori echoes the resolved file path to stdout. If the input path was relative (for example from find . -name "*.md"), later pipeline stages will receive the resolved absolute path emitted by the previous stage. Worker scripts are responsible for writing derived files (e.g. .html) to disk themselves; the pipeline does not rewrite paths to derived filenames between stages.
# stage 1: md → html (writes .html files alongside .md)
# stage 2: minify html (reads and overwrites .html files by convention in minify.mjs)
# stage 3: upload (I/O-limited; upload.mjs derives the .html path from the .md path)
find . -name "*.md" \
| jori -w render.mjs -c 4 \
| jori -w minify.mjs -c 4 \
| jori -w upload.mjs -c 2# find
find ./src -name "*.ts" -not -path "*/node_modules/*" \
| jori -w compile.mjs
# fdir (fastest directory crawler)
node --input-type=module << 'EOF' | jori -w compile.mjs
import { fdir } from 'fdir';
const files = new fdir().glob('**/*.ts').crawl('./src').sync();
process.stdout.write(files.join('\n'));
EOF
# fast-glob
node --input-type=module << 'EOF' | jori -w compile.mjs -c 75%
import fg from 'fast-glob';
for (const f of await fg('src/**/*.ts')) console.log(f);
EOF# Step 1: preview matched files
jori "logs/**/*.log" -w archive.mjs --dry-run
# Step 2: run for real
jori "logs/**/*.log" -w archive.mjsjori "data/*.json" -w transform.mjs -- --format=pretty --locale=ukInside the worker, args is the array of strings after --:
export default async function(filePath, args) {
const isPretty = args.includes('--format=pretty');
// ...
}CLI vs Programmatic API: The CLI prints each processed file path to stdout and ignores worker return values — it is designed for pipelines where the output is a stream of file paths. If your workers compute results that you need to collect (hashes, metadata, transformed data), use the programmatic API: the
onSuccess(filePath, result)callback receives whatever the worker function returns.
import { processFiles } from 'job-ripper';
const result = await processFiles({
files: ['a.ts', 'b.ts'], // string[] | Iterable | AsyncIterable
workerPath: './compile.mjs', // path to worker module
concurrency: 4, // optional, default: cpus × 0.75
workerArgs: ['--strict'], // forwarded to worker as args[]
dryRun: false, // skip actual processing
onSuccess: (f, result) => console.log('✓', f, result),
onTaskError: (f, err) => console.error('✗', f, err.message),
});
console.log(result);
// { total: 2, success: 2, failed: 0, durationMs: 310, concurrency: 4 }The files parameter accepts any iterable or async iterable — arrays, generators, fast-glob streams, fdir crawlers, database cursors, etc.
Returning values from workers: When your worker function returns a value, it is serialized via postMessage (structured clone) and forwarded as the second argument of onSuccess(filePath, result). Keep returned values small and structured-clone-compatible; large objects add IPC overhead.
// hash-worker.mjs
import { readFileSync } from 'node:fs';
import { createHash } from 'node:crypto';
export default async function(filePath) {
const hash = createHash('sha256').update(readFileSync(filePath)).digest('hex');
return { filePath, hash };
}
// main.mjs
const hashes = [];
await processFiles({
files: ['a.bin', 'b.bin'],
workerPath: './hash-worker.mjs',
onSuccess: (filePath, result) => hashes.push(result),
});
console.log(hashes);
// [{ filePath: '/abs/a.bin', hash: '3e2b...' }, { filePath: '/abs/b.bin', hash: 'f1a0...' }]Error handling: Task-level errors (throws inside your worker function) are counted and surfaced via onTaskError if provided, otherwise silent. They are reflected in result.failed. Check that field after the call and decide what to do:
const result = await processFiles({
// ...
onTaskError: (filePath, error) => {
console.error(`Failed: ${filePath} — ${error.message}`);
},
});
if (result.failed > 0) {
console.error(`${result.failed} files failed`);
process.exit(1);
}The worker signature is async, but the code inside should be sync whenever possible. worker_threads gives your function its own OS thread — blocking it is intentional. Sync fs, zlib, and crypto calls avoid Promise/microtask overhead:
// ✅ preferred — sync body inside an async worker
export default async function(filePath) {
const data = readFileSync(filePath);
writeFileSync(filePath + '.gz', gzipSync(data));
}
// ❌ unnecessary async overhead — the thread is already dedicated to you
export default async function(filePath) {
const data = await readFile(filePath);
await writeFile(filePath + '.gz', await gzip(data));
}| Task weight | Computation per file | Recommended -c |
|---|---|---|
| Light | < 10 ms (JSON parse, regex) | 25% — tasks finish faster than IPC overhead; extra workers mostly idle |
| Medium | 10-200 ms (transpile, lint) | 50-75% (default 75%) |
| Heavy | > 200 ms (image encode, PDF) | 75-100% — long tasks justify saturating every core |
| I/O-bound | network / disk limited | use p-limit, not jori |
Why does lighter work need fewer workers? When each task completes in < 10 ms the bottleneck shifts from CPU to the IPC round-trip between the main thread and workers. Spawning more workers than tasks can be dispatched adds synchronization noise without adding throughput. For heavy tasks the opposite is true — each thread stays busy for hundreds of milliseconds, so every extra core translates directly into lower wall time.
Your worker does the CPU work. The main thread only dispatches tasks. Avoid heavy computation inside onSuccess callbacks — those run on the main thread and will create a bottleneck.
Each stage in a pipeline has its own concurrency budget. Tune them to match the weight of each step:
# Render is CPU-heavy, upload is I/O-limited — different concurrency per stage
find . -name "*.md" | jori -w render.mjs -c 75% | jori -w upload.mjs -c 2Before running a worker that modifies or deletes files, verify what files would be matched. --dry-run prints each matched path to stdout and exits without running workers:
jori "**/*.png" -w resize.mjs --dry-run # list matched files
jori "**/*.png" -w resize.mjs --dry-run | wc -l # count themjori accepts file paths in two ways: built-in glob (jori "src/**/*.ts" -w ...) or stdin pipeline (find ... | jori -w ...). The choice can have a significant impact on performance, especially on Windows.
| Method | Linux / macOS | Windows (Git Bash / MSYS2) |
|---|---|---|
Built-in glob / Node.js glob libs (fast-glob, fdir, tinyglobby) |
Fast — native fs calls |
Fast — native fs calls |
find ... | jori |
Fast — native binary | Slow — MSYS2 POSIX emulation layer |
find ... | xargs |
Fast — native binaries | Slow — both find and xargs run under MSYS2 emulation |
Why find is slow on Windows: Git Bash ships a POSIX-emulated find (via MSYS2) that translates every path and syscall through a compatibility layer, and piping through MSYS2's emulated shell adds further overhead for every path handed off to jori. Node.js glob libraries call the Windows filesystem API directly and avoid this overhead entirely.
Recommendation:
- Cross-platform projects — use built-in glob or Node.js glob libraries. They perform consistently on all platforms.
- Linux/macOS-only —
findpipelines are fine and sometimes faster for complex filters (-mtime,-size,-user, etc.) that globs can't express. - Windows with complex filters — use PowerShell's
Get-ChildItemor a Node.js script to produce the file list and pipe it into jori.
Released under the MIT License.