Skip to content

Performance: Common Subexpression Eliminator Speedup#16741

Open
DanielVF wants to merge 6 commits into
argotorg:developfrom
DanielVF:yul-data-tracking-speedup
Open

Performance: Common Subexpression Eliminator Speedup#16741
DanielVF wants to merge 6 commits into
argotorg:developfrom
DanielVF:yul-data-tracking-speedup

Conversation

@DanielVF
Copy link
Copy Markdown
Contributor

@DanielVF DanielVF commented May 19, 2026

Description

This PR speeds up large file compilation with --via-ir by 5% in my testing (100 runs for each the benchmark and this change). Output bytecode has been identical in my tests.

The common subexpression eliminator is the most time consuming of all yul optimizers in my testing, it in turn spends most of its time in dataflow analysis.

image

The analysis visits each AST node, records changed values, and on each node visits looks through relevant previously recorded values to see if there is a match. Because this runs against every AST node, it generates a lot of reads and writes.

Currently it uses std:set for the hot path container that stores what has been seen and looks it up. std:set, in spite of the name, is an ordered tree structure, meaning that each "have I seen this" at that happens at each node involves pointer walking a tree. This is much slower when compared using a hash based lookups with std: unordered_set. This same speed up of hash vs order preserving tree happens on writes as well.

Secondly, when we do find a match, it is overwhelmingly likely to be a single match, not a gigantic list of matches. By returning return a simple vector of results, we remove the overhead of dealing with a tree structure for every result return.

Lastly when clearing values, the previous code first created a new combined ordered and deduplicated set, before removing entries. This involved the overhead of creating a new ordered set, and the writes involved in combining data. Now that clearing is cheap, it's faster to just loop each input and remove if needed. The cost of duplicate ignored erases is less than the cost to build a new deduplicated tree in the first place.

Checklist

AI Disclosure

  • No AI tools were used
  • AI tools were used (details below)

Codex 5.5

@DanielVF DanielVF changed the title Performance: Common subexpression eliminator hot path fix Performance: Common Subexpression Eliminator Speedup May 19, 2026
@cameel cameel self-assigned this May 20, 2026
@msooseth
Copy link
Copy Markdown
Contributor

First of all, thanks! Looks very neat, I hope we can merge this within a week or so, just like your other PR.

Scheduled the run on our benchmarking setup, https://github.com/argotorg/solc-bench/ I'll post the compare table generated by it, here, soon.

@msooseth
Copy link
Copy Markdown
Contributor

Looks good from a perf perspective :)

matesoos@bench-01:~/ > solc-bench compare out-84/bench-results.json out-86/bench-results.json
Baseline: 0.8.36-ci.2026.5.20+commit.bbdb1192.Linux.g++
Target:   0.8.36-ci.2026.5.19+commit.ac9ae2fd.Linux.g++
Δ% = (target - baseline) / baseline. Negative = improvement (lower is better), positive = regression.
winner = '~noise' unless the gap passes a Welch t-test and |Δ%| ≥ 0.1%.

Benchmark           Pipeline  Metric         Base                   Target                 Δ%      winner
------------------  --------  -------------  ---------------------  ---------------------  ------  --------
openzeppelin-5.6.1  evmasm    cpu_time       21.2580s ± 0.0469s     20.8819s ± 0.0228s     -1.77%  TARGET
                              creation_size  725,287 ± 0            725,287 ± 0            0.0%    tie
                              cycles         73.9311G ± 183.6215M   72.7606G ± 94.4860M    -1.58%  TARGET
                              instructions   153.3280G ± 133        152.1554G ± 151        -0.76%  TARGET
                              peak_rss       1079 ± 0 MiB           1028 ± 0 MiB           -4.73%  TARGET
                              runtime_size   650,752 ± 0            650,752 ± 0            0.0%    tie
                              wall_time      21.4028s ± 0.0423s     21.0192s ± 0.0224s     -1.79%  TARGET

openzeppelin-5.6.1  ir        cpu_time       58.2292s ± 0.0596s     56.2774s ± 0.0383s     -3.35%  TARGET
                              creation_size  675,405 ± 0            675,405 ± 0            0.0%    tie
                              cycles         206.0595G ± 165.6536M  199.0914G ± 131.3197M  -3.38%  TARGET
                              instructions   345.3348G ± 421        337.5638G ± 198        -2.25%  TARGET
                              peak_rss       1469 ± 0 MiB           1468 ± 0 MiB           -0.04%  ~noise
                              runtime_size   598,355 ± 0            598,355 ± 0            0.0%    tie
                              wall_time      58.5378s ± 0.0592s     56.5809s ± 0.0378s     -3.34%  TARGET

solady-0.1.26       evmasm    cpu_time       30.9631s ± 0.0698s     30.6874s ± 0.0075s     -0.89%  TARGET
                              creation_size  1,514,791 ± 0          1,514,791 ± 0          0.0%    tie
                              cycles         107.3551G ± 241.1873M  106.4118G ± 20.1522M   -0.88%  TARGET
                              instructions   238.1581G ± 48         237.3033G ± 168        -0.36%  TARGET
                              peak_rss       1828 ± 0 MiB           1828 ± 0 MiB           +0.01%  ~noise
                              runtime_size   1,480,154 ± 0          1,480,154 ± 0          0.0%    tie
                              wall_time      31.1885s ± 0.0594s     30.9001s ± 0.0158s     -0.92%  TARGET

solady-0.1.26       ir        cpu_time       153.4674s ± 0.1285s    148.3808s ± 0.0800s    -3.31%  TARGET
                              creation_size  1,681,895 ± 0          1,681,895 ± 0          0.0%    tie
                              cycles         545.3756G ± 470.5243M  527.1648G ± 282.0594M  -3.34%  TARGET
                              instructions   864.1106G ± 257        845.0659G ± 584        -2.2%   TARGET
                              peak_rss       2949 ± 0 MiB           2949 ± 0 MiB           -0.01%  ~noise
                              runtime_size   1,650,497 ± 0          1,650,497 ± 0          0.0%    tie
                              wall_time      154.2492s ± 0.1354s    149.1417s ± 0.0790s    -3.31%  TARGET

prb-math-4.1.1      evmasm    cpu_time       7.4446s ± 0.0040s      7.4045s ± 0.0041s      -0.54%  TARGET
                              creation_size  252,039 ± 0            252,039 ± 0            0.0%    tie
                              cycles         24.1709G ± 12.0640M    24.0902G ± 17.6974M    -0.33%  TARGET
                              instructions   54.2630G ± 179         54.1886G ± 63          -0.14%  TARGET
                              peak_rss       1174 ± 0 MiB           1174 ± 0 MiB           0.0%    tie
                              runtime_size   250,178 ± 0            250,178 ± 0            0.0%    tie
                              wall_time      7.4909s ± 0.0096s      7.4599s ± 0.0104s      -0.41%  TARGET

prb-math-4.1.1      ir        cpu_time       30.0269s ± 0.0446s     28.8608s ± 0.0131s     -3.88%  TARGET
                              creation_size  262,432 ± 0            262,432 ± 0            0.0%    tie
                              cycles         104.6689G ± 149.5207M  100.5571G ± 52.0495M   -3.93%  TARGET
                              instructions   166.2480G ± 141        161.4395G ± 282        -2.89%  TARGET
                              peak_rss       1499 ± 0 MiB           1494 ± 0 MiB           -0.29%  TARGET
                              runtime_size   260,832 ± 0            260,832 ± 0            0.0%    tie
                              wall_time      30.1784s ± 0.0492s     29.0029s ± 0.0156s     -3.9%   TARGET

forge-std-1.16.1    evmasm    cpu_time       9.5044s ± 0.0098s      9.3483s ± 0.0161s      -1.64%  TARGET
                              creation_size  394,603 ± 0            394,603 ± 0            0.0%    tie
                              cycles         32.8771G ± 21.9855M    32.3523G ± 63.0404M    -1.6%   TARGET
                              instructions   69.7060G ± 58          69.2213G ± 87          -0.7%   TARGET
                              peak_rss       532 ± 0 MiB            533 ± 0 MiB            +0.21%  BASELINE
                              runtime_size   381,729 ± 0            381,729 ± 0            0.0%    tie
                              wall_time      9.5606s ± 0.0165s      9.3988s ± 0.0185s      -1.69%  TARGET

forge-std-1.16.1    ir        cpu_time       38.2680s ± 0.0157s     36.9562s ± 0.0070s     -3.43%  TARGET
                              creation_size  422,073 ± 0            422,073 ± 0            0.0%    tie
                              cycles         135.6666G ± 59.0151M   130.9462G ± 18.7860M   -3.48%  TARGET
                              instructions   202.9529G ± 111        197.9558G ± 260        -2.46%  TARGET
                              peak_rss       807 ± 0 MiB            828 ± 0 MiB            +2.6%   BASELINE
                              runtime_size   406,750 ± 0            406,750 ± 0            0.0%    tie
                              wall_time      38.4725s ± 0.0123s     37.1421s ± 0.0041s     -3.46%  TARGET

v4-core-4.0.0       evmasm    cpu_time       19.6607s ± 0.0154s     19.3715s ± 0.0289s     -1.47%  TARGET
                              creation_size  2,012,188 ± 0          2,012,188 ± 0          0.0%    tie
                              cycles         68.6066G ± 55.6035M    67.6265G ± 108.2576M   -1.43%  TARGET
                              instructions   146.4592G ± 117        145.3933G ± 195        -0.73%  TARGET
                              peak_rss       878 ± 0 MiB            877 ± 0 MiB            -0.13%  TARGET
                              runtime_size   1,983,262 ± 0          1,983,262 ± 0          0.0%    tie
                              wall_time      19.7958s ± 0.0169s     19.5031s ± 0.0329s     -1.48%  TARGET

v4-core-4.0.0       ir        cpu_time       98.9897s ± 0.0480s     96.1066s ± 0.0366s     -2.91%  TARGET
                              creation_size  1,804,635 ± 0          1,804,635 ± 0          0.0%    tie
                              cycles         351.9026G ± 199.9921M  341.6332G ± 141.4668M  -2.92%  TARGET
                              instructions   552.4681G ± 242        541.6489G ± 334        -1.96%  TARGET
                              peak_rss       1730 ± 0 MiB           1740 ± 0 MiB           +0.59%  BASELINE
                              runtime_size   1,777,945 ± 0          1,777,945 ± 0          0.0%    tie
                              wall_time      99.4856s ± 0.0475s     96.5926s ± 0.0353s     -2.91%  TARGET

morpho-blue-1.0.0   evmasm    cpu_time       11.2212s ± 0.0072s     11.0675s ± 0.0053s     -1.37%  TARGET
                              creation_size  802,290 ± 0            802,290 ± 0            0.0%    tie
                              cycles         39.1200G ± 28.6438M    38.5878G ± 12.3420M    -1.36%  TARGET
                              instructions   84.1987G ± 41          83.6671G ± 114         -0.63%  TARGET
                              peak_rss       512 ± 0 MiB            512 ± 0 MiB            +0.03%  ~noise
                              runtime_size   798,085 ± 0            798,085 ± 0            0.0%    tie
                              wall_time      11.2834s ± 0.0018s     11.1275s ± 0.0096s     -1.38%  TARGET

morpho-blue-1.0.0   ir        cpu_time       57.9787s ± 0.0027s     55.7305s ± 0.0754s     -3.88%  TARGET
                              creation_size  753,526 ± 0            753,526 ± 0            0.0%    tie
                              cycles         206.2192G ± 5.5547M    198.1895G ± 273.6322M  -3.89%  TARGET
                              instructions   293.3590G ± 123        284.6393G ± 150        -2.97%  TARGET
                              peak_rss       961 ± 0 MiB            958 ± 0 MiB            -0.33%  TARGET
                              runtime_size   750,240 ± 0            750,240 ± 0            0.0%    tie
                              wall_time      58.2643s ± 0.0021s     56.0092s ± 0.0777s     -3.87%  TARGET

seaport-1.6         evmasm    cpu_time       141.5387s ± 0.0854s    139.7601s ± 0.1067s    -1.26%  TARGET
                              creation_size  13,568,823 ± 0         13,568,823 ± 0         0.0%    tie
                              cycles         494.7027G ± 316.3915M  488.3921G ± 270.0835M  -1.28%  TARGET
                              instructions   1003.0770G ± 700       997.4682G ± 569        -0.56%  TARGET
                              peak_rss       2695 ± 0 MiB           2691 ± 0 MiB           -0.15%  TARGET
                              runtime_size   12,556,893 ± 0         12,556,893 ± 0         0.0%    tie
                              wall_time      142.3613s ± 0.0894s    140.5911s ± 0.1338s    -1.24%  TARGET

solmate-6           evmasm    cpu_time       4.7162s ± 0.0019s      4.6435s ± 0.0068s      -1.54%  TARGET
                              creation_size  279,294 ± 0            279,294 ± 0            0.0%    tie
                              cycles         16.5228G ± 10.1319M    16.2792G ± 18.1310M    -1.47%  TARGET
                              instructions   38.0325G ± 13          37.8248G ± 82          -0.55%  TARGET
                              peak_rss       780 ± 0 MiB            780 ± 0 MiB            0.0%    tie
                              runtime_size   272,389 ± 0            272,389 ± 0            0.0%    tie
                              wall_time      4.7447s ± 0.0100s      4.6708s ± 0.0149s      -1.56%  TARGET

solmate-6           ir        cpu_time       21.0628s ± 0.0274s     20.2880s ± 0.0261s     -3.68%  TARGET
                              creation_size  267,193 ± 0            267,193 ± 0            0.0%    tie
                              cycles         74.8282G ± 96.3023M    72.0638G ± 95.1746M    -3.69%  TARGET
                              instructions   115.2227G ± 75         112.4131G ± 271        -2.44%  TARGET
                              peak_rss       780 ± 0 MiB            780 ± 0 MiB            0.0%    tie
                              runtime_size   259,851 ± 0            259,851 ± 0            0.0%    tie
                              wall_time      21.1662s ± 0.0273s     20.3864s ± 0.0202s     -3.68%  TARGET

Only regression is minor, on peak RSS usage, which is still low, and quite minimal. A clear win.

Comment thread Changelog.md Outdated
Comment on lines +135 to +137
auto& candidates = m_replacementCandidates[*_value];
if (std::find(candidates.begin(), candidates.end(), _variable) == candidates.end())
candidates.emplace_back(_variable);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will affect the order in which candidates are iterated in, so could/will affect the bytecode in the sense that it may now pick a different (although still valid) (sub)expression to be substituted. There are no guarantees on our side as to what the bytecode will actually look like as long as it's semantically correct, so I'm pretty sure this isn't a problem. @cameel can you double check please - this is the only change where I'm slightly uncertain?

Copy link
Copy Markdown
Contributor Author

@DanielVF DanielVF May 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

candidates here is an order preserving vector, so the iteration and subsequent emplace_back is deterministic.

m_replacementCandidates is the std:unordered_set with a key type of expression and value type of vector. We just use the unordered_set to find the vector of previously found candidates. Then, this code is just appending to that ordered vector when a new candidate is found. So this particular code should be safe.

Even if the candidates vector were unordered, this check is just checking for presence, which is deterministic regardless of order.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're still iterating over the values in the vector (previously set), and the two have different iteration orders (set sort vs insertion), which makes it possible for different (but equivalent) expressions/values to be used as replacements. I'm not saying this is wrong or undesired behaviour, just that it could result in different bytecode between the two approaches, which is why I asked @cameel to double check. From my point of view, this is perfectly fine (but I have been wrong before).

Copy link
Copy Markdown
Contributor Author

@DanielVF DanielVF May 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In pseudo javascript:

m_replacementCandidates = {}

function insert(value, expression){
  candidates = m_replacementCandidates[value] // plus magic to insert array if not present
  if(!candidates.find(expression)){
     candidates.push(expression)
  }
}

Copy link
Copy Markdown
Contributor Author

@DanielVF DanielVF May 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah - now I understand! The set might have been sorted by something different than its insertion order, even if both are deterministic.

Copy link
Copy Markdown
Contributor Author

@DanielVF DanielVF May 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the important bit is not backwards compatible bytecode, but consistent bytecode output for this compiler version. In that case, a different sorting order would be fine, as long as it's deterministic. And this is definitely that.

That said, no tests have needed to change as a result of this, and my local benchmarking on a set of real world contracts has identical bytecode out with this change, as without it.

Comment thread Changelog.md

Compiler Features:
* General: Speed up SHA-256 hashing (`picosha2`).
* General: Speed up compilation times.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, should probably mention that this particular improvement is due to optimizations in the CSE and dataflow analysis. In any case, let's wait until @cameel takes a look, he'll probably suggest a final version of the changelog entry phrasing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants