perf: ~2x faster parse() via hand-written character scanner#1010
Open
homanp wants to merge 2 commits into
Open
perf: ~2x faster parse() via hand-written character scanner#1010homanp wants to merge 2 commits into
homanp wants to merge 2 commits into
Conversation
Reproduces with: node tests/test-parse-perf.js Reports median of 7 runs of 5000 parse() calls on a 200-line .env.
The line-matching regex in parse() is the dominant cost when parsing large .env files. Replaces it with a Uint8Array(256) lookup table for key character classification and a manual state machine over the input string. indexOf() is used to find the end of unquoted values and newline boundaries. Behavior is preserved across the existing test suite and 24 additional edge cases (escaped quotes inside double quotes, unterminated quotes, yaml-style 'KEY: value', export prefix, multiline values, inline comments, KEY==value, etc.). No new dependencies. No changes to public API.
Owner
|
@homanp wow, one of the most interesting contributions to dotenv in a decade. Give me time on this one to consider the maintenance cost here. It's much less readable, but it is very interesting. |
Author
Thank you, happy to contribute. I have some other ideas as well for cold boot but haven't tested them yet. Open to feedback on this PR. Happy to make optimizations based on your thoughts |
Repository owner
deleted a comment from
lolusiayy
Apr 21, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Swaps the
LINEregex inparse()for a hand-rolled character scanner with aUint8Array(256)lookup table for key chars. About 2x faster on real .env files. Same behavior, same API, still zero deps.Benchmarks
Repro:
5000 parses of a 200-line .env, median of 7 runs, three fresh node processes per side (Node 23, macOS):
~2.1x speedup, variance under 2% on both sides. Also tried it against a realistic ~50-key production-style .env (db, auth, aws, flags) — similar story, roughly 1.9x.
What changed
The regex was the bottleneck. Replaced it with a single pass over the string using
charCodeAt:Uint8Array(256)built at module load tells us in O(1) whether a byte is a valid key char ([A-Za-z0-9._-]).indexOffinds the\n/#boundaries — V8'sindexOfis way faster than walking char by char.\rin the input, instead of always.Tests
Existing test suite passes. I also diffed output against master across a bunch of edge cases I was worried about and got byte-identical results: escaped quotes inside double quotes, unterminated quotes,
KEY: value(yaml-style),export KEY=..., multiline values with\nescapes, inline comments after quoted values,KEY==value,KEY=val#notcomment,KEY=#nothing, leading/trailing whitespace,\rline endings, empty input, comment-only input.Happy to add any of those as actual TAP cases if you'd like them in the suite.
Notes
No API changes, no behavior changes, no new deps.