Skip to content

perf: ~2x faster parse() via hand-written character scanner#1010

Open
homanp wants to merge 2 commits into
motdotla:masterfrom
homanp:perf/parser-char-scanner
Open

perf: ~2x faster parse() via hand-written character scanner#1010
homanp wants to merge 2 commits into
motdotla:masterfrom
homanp:perf/parser-char-scanner

Conversation

@homanp
Copy link
Copy Markdown

@homanp homanp commented Apr 15, 2026

Summary

Swaps the LINE regex in parse() for a hand-rolled character scanner with a Uint8Array(256) lookup table for key chars. About 2x faster on real .env files. Same behavior, same API, still zero deps.

Benchmarks

Repro:

node tests/test-parse-perf.js

5000 parses of a 200-line .env, median of 7 runs, three fresh node processes per side (Node 23, macOS):

run 1 run 2 run 3
master 121.21 ms 121.77 ms 122.85 ms
this PR 57.86 ms 58.71 ms 58.26 ms

~2.1x speedup, variance under 2% on both sides. Also tried it against a realistic ~50-key production-style .env (db, auth, aws, flags) — similar story, roughly 1.9x.

What changed

The regex was the bottleneck. Replaced it with a single pass over the string using charCodeAt:

  • Uint8Array(256) built at module load tells us in O(1) whether a byte is a valid key char ([A-Za-z0-9._-]).
  • For unquoted values, indexOf finds the \n / # boundaries — V8's indexOf is way faster than walking char by char.
  • CR normalization only runs when there's actually a \r in the input, instead of always.

Tests

Existing test suite passes. I also diffed output against master across a bunch of edge cases I was worried about and got byte-identical results: escaped quotes inside double quotes, unterminated quotes, KEY: value (yaml-style), export KEY=..., multiline values with \n escapes, inline comments after quoted values, KEY==value, KEY=val#notcomment, KEY=#nothing, leading/trailing whitespace, \r line endings, empty input, comment-only input.

Happy to add any of those as actual TAP cases if you'd like them in the suite.

Notes

No API changes, no behavior changes, no new deps.

homanp added 2 commits April 15, 2026 19:32
Reproduces with: node tests/test-parse-perf.js
Reports median of 7 runs of 5000 parse() calls on a 200-line .env.
The line-matching regex in parse() is the dominant cost when parsing
large .env files. Replaces it with a Uint8Array(256) lookup table for
key character classification and a manual state machine over the
input string. indexOf() is used to find the end of unquoted values
and newline boundaries.

Behavior is preserved across the existing test suite and 24 additional
edge cases (escaped quotes inside double quotes, unterminated quotes,
yaml-style 'KEY: value', export prefix, multiline values, inline
comments, KEY==value, etc.).

No new dependencies. No changes to public API.
@motdotla
Copy link
Copy Markdown
Owner

@homanp wow, one of the most interesting contributions to dotenv in a decade.

Give me time on this one to consider the maintenance cost here. It's much less readable, but it is very interesting.

@homanp
Copy link
Copy Markdown
Author

homanp commented Apr 15, 2026

@homanp wow, one of the most interesting contributions to dotenv in a decade.

Give me time on this one to consider the maintenance cost here. It's much less readable, but it is very interesting.

Thank you, happy to contribute. I have some other ideas as well for cold boot but haven't tested them yet.

Open to feedback on this PR. Happy to make optimizations based on your thoughts

Repository owner deleted a comment from lolusiayy Apr 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants