Add new multithreaded TwoQubitPeepholeOptimization pass by mtreinish · Pull Request #13419 · Qiskit/qiskit

mtreinish · 2024-11-10T16:12:54Z

Summary

This commit adds a new transpiler pass for physical optimization,
TwoQubitPeepholeOptimization. This replaces the use of Collect2qBlocks,
ConsolidateBlocks, and UnitarySynthesis in the optimization stage for
a default pass manager setup. The pass logically works the same way
where it analyzes the dag to get a list of 2q runs, calculates the matrix
of each run, and then synthesizes the matrix and substitutes it inplace.
The distinction this pass makes though is it does this all in a single
pass and also parallelizes the matrix calculation and synthesis steps
because there is no data dependency there.

This new pass is not meant to fully replace the Collect2qBlocks,
ConsolidateBlocks, or UnitarySynthesis passes as those also run in
contexts where we don't have a physical circuit. This is meant instead
to replace their usage in the optimization stage only. Accordingly this
new pass also changes the logic on how we select the synthesis to use
and when to make a substitution. Previously this logic was primarily done
via the ConsolidateBlocks pass by only consolidating to a UnitaryGate if
the number of basis gates needed based on the weyl chamber coordinates
was less than the number of 2q gates in the block (see #11659 for
discussion on this). Since this new pass skips the explicit
consolidation stage we go ahead and try all the available synthesizers

Right now this commit has a number of limitations, the largest are:

Only supports the target
It doesn't support the XX decomposer because it's not in rust (the TwoQubitBasisDecomposer and TwoQubitControlledUDecomposer are used)

This pass doesn't support using the unitary synthesis plugin interface, since
it's optimized to use Qiskit's built-in two qubit synthesis routines written in
Rust. The existing combination of ConsolidateBlocks and UnitarySynthesis
should be used instead if the plugin interface is necessary.

Details and comments

Fixes #12007
Fixes #11659

TODO:

Rebase after Use OnceLock instead of OnceCell #13410 merges
Add tests
Add documentation
Benchmarking and performance tuning
Handle running serially when in multiprocessing context
Release note

coveralls · 2024-11-10T16:37:45Z

Coverage Report for CI Build 26225241457

Warning

Build has drifted: This PR's base is out of sync with its target branch, so coverage data may include unrelated changes.
Quick fix: rebase this PR. Learn more →

Coverage decreased (-0.1%) to 87.488%

Details

Coverage decreased (-0.1%) from the base build.
Patch coverage: 34 uncovered changes across 4 files (307 of 341 lines covered, 90.03%).
304 coverage regressions across 17 files.

Uncovered Changes

File	Changed	Covered	%
crates/transpiler/src/passes/two_qubit_peephole.rs	237	208	87.76%
crates/synthesis/src/two_qubit_decompose/basis_decomposer.rs	8	6	75.0%
crates/transpiler/src/passes/unitary_synthesis/mod.rs	53	51	96.23%
crates/pyext/src/lib.rs	4	3	75.0%

Coverage Regressions

304 previously-covered lines in 17 files lost coverage.

Top 10 Files by Coverage Loss	Lines Losing Coverage	Coverage
crates/synthesis/src/two_qubit_decompose/basis_decomposer.rs	66	84.49%
crates/circuit/src/circuit_drawer.rs	56	95.88%
crates/bindgen/src/lib.rs	45	68.48%
crates/circuit/src/classical/expr/expr.rs	44	91.95%
crates/synthesis/src/two_qubit_decompose/controlled_u_decomposer.rs	25	93.47%
qiskit/circuit/library/generalized_gates/linear_function.py	15	85.25%
crates/transpiler/src/passes/commutative_optimization.rs	13	96.75%
crates/bindgen-cli/src/main.rs	12	0.0%
crates/bindgen/src/simple_ir.rs	7	74.55%
crates/circuit/src/variable_mapper.rs	6	63.51%

Coverage Stats


Relevant Lines:	123676
Covered Lines:	108202
Line Coverage:	87.49%
Coverage Strength:	961479.69 hits per line

💛 - Coveralls

This commit adds a new transpiler pass for physical optimization, TwoQubitPeepholeOptimization. This replaces the use of Collect2qBlocks, ConsolidateBlocks, and UnitarySynthesis in the optimization stage for a default pass manager setup. The pass logically works the same way where it analyzes the dag to get a list of 2q runs, calculates the matrix of each run, and then synthesizes the matrix and substitutes it inplace. The distinction this pass makes though is it does this all in a single pass and also parallelizes the matrix calculation and synthesis steps because there is no data dependency there. This new pass is not meant to fully replace the Collect2qBlocks, ConsolidateBlocks, or UnitarySynthesis passes as those also run in contexts where we don't have a physical circuit. This is meant instead to replace their usage in the optimization stage only. Accordingly this new pass also changes the logic on how we select the synthesis to use and when to make a substituion. Previously this logic was primarily done via the ConsolidateBlocks pass by only consolidating to a UnitaryGate if the number of basis gates needed based on the weyl chamber coordinates was less than the number of 2q gates in the block (see Qiskit#11659 for discussion on this). Since this new pass skips the explicit consolidation stage we go ahead and try all the available synthesizers Right now this commit has a number of limitations, the largest are: - Only supports the target - It doesn't support any synthesizers besides the TwoQubitBasisDecomposer, because it's the only one in rust currently. For plugin handling I left the logic as running the three pass series, but I'm not sure this is the behavior we want. We could say keep the synthesis plugins for `UnitarySynthesis` only and then rely on our built-in methods for physical optimiztion only. But this also seems less than ideal because the plugin mechanism is how we support synthesizing to custom basis gates, and also more advanced approximate synthesis methods. Both of those are things we need to do as part of the synthesis here. Additionally, this is currently missing tests and documentation and while running it manually "works" as in it returns a circuit that looks valid, I've not done any validation yet. This also likely will need several rounds of performance optimization and tuning. t this point this is just a rough proof of concept and will need a lof refinement along with larger changes to Qiskit's rust code before this is ready to merge. Fixes Qiskit#12007 Fixes Qiskit#11659

…rallel-pass

Since Qiskit#13139 merged we have another two qubit decomposer available to run in rust, the TwoQubitControlledUDecomposer. This commit updates the new TwoQubitPeepholeOptimization to call this decomposer if the target supports appropriate 2q gates.

Clippy is correctly warning that the size difference between the two decomposer types in the TwoQubitDecomposer enumese two types is large. TwoQubitBasisDecomposer is 1640 bytes and TwoQubitControlledUDecomposer is only 24 bytes. This means each element of ControlledU is wasting > 1600 bytes. However, in this case that is acceptable in order to avoid a layer of pointer indirection as these are stored temporarily in a vec inside a thread to decompose a unitary. A trait would be more natural for this to define a common interface between all the two qubit decomposers but since we keep them instantiated for each edge in a Vec they need to be sized and doing something like `Box<dyn TwoQubitDecomposer>` (assuming a trait `TwoQubitDecomposer` instead of a enum) to get around this would have additional runtime overhead. This is also considering that TwoQubitControlledUDecomposer has far less likelihood in practice as it only works with some targets that have RZZ, RXX, RYY, or RZX gates on an edge which is less common.

…rallel-pass

Also don't run scoring more than needed.

ShellyGarion · 2025-01-23T07:02:09Z

Copy here the comment of @t-imamichi #13568 (comment)
and my reply: #13568 (comment)

I think this closes #13428. How about adding a test case of consecutive RZZ (RXX, and RYY) gates?

We should make sure that after PR #13568 and this PR will be merged, we can efficiently transpile circuits into basis fractional RZZ gates .

…rallel-pass

mtreinish · 2025-01-26T14:09:40Z

I added support for using the ControlledUDecomposer to the new pass back in early December with this commit: 746758f although looking at that now with fresh eyes I need to check that the gate is continuous in the target, right now it only looks at the supported gate types.

…tinuous

Cryoris

I'm not done yet, but I'm already submitting these comments below 🙂

Cryoris · 2026-05-05T12:36:27Z

+                original_2q_count,
+                1. - original_fidelity,


I assume this order gives the precedence in sorting -- shouldn't the fidelity be the main priority over the 2q count? Since this is done after routing (it is, right?) it seems like we should be optimizing for fidelity over anything else

Edit: Reading the rest, I assume this is because the output operations might not be in the target basis which makes the fidelity calculation an estimation only. If that's what's going on could we leave a comment to explaining this?

I discussed this in: #13419 (comment) I think we could change it to be fidelity first, but this was a change I made during the development to try and debug other issues. I think right now it's a good choice because if there is a controlled U equivalent entangling gate we will emit too many 1q gates (until #16036 is fixed) which will get simplified by Optimize1qGatesDecomposition but if error was the first heuristic we would miss an optimization opportunity. I think we can investigate switching it to be fidelity first after #16036 is fixed.

#16036 is ready now :)

…rallel-pass

This avoids the second synchronization point during the parallel portion of the pass function. While an AtomicBool shouldn't have much synchronization overhead it wasn't strictly necessary. We do have the O(n) overhead of iterating over the runs to determine if we changed anything but this is the less common case that we don't make any substitutions so taking the overhead isn't a huge deal.

In Qiskit#13419 the pass has been updated to handle threads that need access to Python objects to get matrices or definitions of custom Python gates. This changes the story for the C API when using the C API in a python binding context. To run the C API reliably within a Python context the pass needs to be in a context with the GIL so it can be reliably released and reacquired where necessary in the rust code.

Cryoris

Still need to go through the tests and docs, but here's already a next chunk 🙂

Cryoris · 2026-05-19T07:49:35Z

+        let mut original_2q_count: usize = 0;
+        let original_total_count: usize = node_indices.len();
+        let mut outside_target = false;
+        for node_index in node_indices {


Couldn't this use the fidelity_2q_sequence function we've defined further above?

The reason I didn't comes down to the typing, the fidelity_2q_sequence expects the inputs in all the forms that are used internally by the unitary synthesis module (a vec of tuples with the gates, params, and qubits, target wrapped in a QpuConstraint, etc). While at this point we just have a vec of node indices in the dag of the original. I'd have to rewrite fidelity_2q_sequence to be more generic to be able to reuse it in this context without any overhead. I can do that if you think it's necessary

Co-authored-by: Julien Gacon <gaconju@gmail.com>

Cryoris · 2026-05-21T11:23:33Z

Are we (can we?) ensure in the tests somewhere that the pass will be called with multiprocessing enabled?

This PR doesn't test it because we don't use multiprocessing on the pass as a standalone. In general we disable multiprocessing in the test suite because we are in a multiprocessing context already via stestr. We do have a special testing harness in place that runs the transpiler (which is the only place we use multiprocessing directly) with multiprocessing enabled. So #16136 should exercise it via that.

Following on from Qiskit#13419 which added a new optimization pass TwoQubitPeepholeOptimization which was designed to replace the pair of ConsolidateBlocks and UnitarySynthesis for the optimization stage after we have a physical circuit. That PR however did not update the preset pass managers to concentrate the review on just adding the new pass. This continues off from there by updating the preset pass managers to use the new pass in optimization levels 2 and 3 replacing those levels' optimization stage's previous usage of ConsolidateBlocks and UnitarySynthesis to achieve the same goal. This should result in both a runtime performance and transpilation quality improvement as the new pass is both faster and should produce better fidelity circuits than the previous peephole optimization. The tests updates that are made in this PR are because the peephole optimization is changing the transpilation output of various test circuits. These were all verified to be valid outputs and in all cases a "better" output than before. Specifically, for the tests updated these were the changes in output and why they occurred: * The two tests in test.python.circuit.test_scheduled_circuit.TestScheduledCircuit were the single CX gate in the output circuit was flipped from (0, 1) to (1, 0) because in the target the error rate for the (0, 1) direction was higher than the extra error cost of 3 sx gates (the rz gates have 0 error). * In test_unroll_only_if_not_gates_in_basis from test.python.transpiler.test_preset_passmanagers.TestPresetPassManager we no longer run ConsolidateBlocks in the optimization loop so we no longer need to add the 2 executions from the init and translation stages. The test is updated to count the new peephole pass which is the intent of the count check, to check the pass in the optimization loop. * In test_2q_circuit_5q_backend_v2 from test.python.transpiler.test_vf2_post_layout.TestVF2PostLayoutUndirected had the same cx gate flipping because the error rate in the original layout for the reverse direction was 0.000779905 vs 0.00163587 in the original direction. So the new pass was correctly flipping the cx gate resulting in a different circuit that vf2 couldn't place anywhere better. To fix this the test sets a fixed layout on worse qubits so that vf2 will have to place it somewhere better. * For test_layout_tokyo_fully_connected_cx_4_3 from test.python.transpiler.test_preset_passmanagers.TestFinalLayouts the output circuit has a better estimated fidelity (although more gates in general). The transpiler output goes from an estimated fidelity of 0.9526614226294913 before the new pass was used to an estimated fidelity of 0.961996188569715 after the new pass is used. This new circuit with a better fidelity has a different initial layout set now, so the test is updated to use the new layout.

This commit adds an option to the TwoQubitPeepholeOptimization transpiler pass added in Qiskit#13419 to allow configuring the heuristic priority when instantiating the pass. When the pass is making a decision on which potential synthesis outcome out of multiple is the best or when to replace a block with the best synthesis outcome there are three metrics we look at, the estimated fidelity, the number of 2q gates, and the total number of gates. The order of this comparison is inherently flexible and prioritizes different aspects of the synthesis. This exposes a new option to the pass to enable users to specify which aspect they want the pass to prioritize. Previously the pass was hard coded to prioritize 2q gate count, but as was discussed in the PR review and the commit history of the PR branch this isn't necessarily the ideal choice. Intuiatively assuming relatively accurate error rates in the target the estimated fidelity should be the first priority. This was changed to prioritize two qubit gate count during development as a debugging step and left in place while we worked on issues with the TwoQubitControlledUDecomposer's 1q component handling (see Qiskit#16123). Now that the TwoQubitControlledUDecomposer issue has been resolved we can explore switching the default, this new argument will make it much easier to do side by side comparisons of the different options. The intent is to enable updating the usage in the preset pass managers (added in Qiskit#16136) without having to modify the pass's rust code. This does not include a release not as the the new pass has not been included in a release yet so there is no addition to a public API in this PR.

In Qiskit#13419 the pass has been updated to handle threads that need access to Python objects to get matrices or definitions of custom Python gates. This changes the story for the C API when using the C API in a python binding context. To run the C API reliably within a Python context the pass needs to be in a context with the GIL so it can be reliably released and reacquired where necessary in the rust code.

* Use TwoQubitPeepholeOptimization in preset pass managers Following on from Qiskit#13419 which added a new optimization pass TwoQubitPeepholeOptimization which was designed to replace the pair of ConsolidateBlocks and UnitarySynthesis for the optimization stage after we have a physical circuit. That PR however did not update the preset pass managers to concentrate the review on just adding the new pass. This continues off from there by updating the preset pass managers to use the new pass in optimization levels 2 and 3 replacing those levels' optimization stage's previous usage of ConsolidateBlocks and UnitarySynthesis to achieve the same goal. This should result in both a runtime performance and transpilation quality improvement as the new pass is both faster and should produce better fidelity circuits than the previous peephole optimization. The tests updates that are made in this PR are because the peephole optimization is changing the transpilation output of various test circuits. These were all verified to be valid outputs and in all cases a "better" output than before. Specifically, for the tests updated these were the changes in output and why they occurred: * The two tests in test.python.circuit.test_scheduled_circuit.TestScheduledCircuit were the single CX gate in the output circuit was flipped from (0, 1) to (1, 0) because in the target the error rate for the (0, 1) direction was higher than the extra error cost of 3 sx gates (the rz gates have 0 error). * In test_unroll_only_if_not_gates_in_basis from test.python.transpiler.test_preset_passmanagers.TestPresetPassManager we no longer run ConsolidateBlocks in the optimization loop so we no longer need to add the 2 executions from the init and translation stages. The test is updated to count the new peephole pass which is the intent of the count check, to check the pass in the optimization loop. * In test_2q_circuit_5q_backend_v2 from test.python.transpiler.test_vf2_post_layout.TestVF2PostLayoutUndirected had the same cx gate flipping because the error rate in the original layout for the reverse direction was 0.000779905 vs 0.00163587 in the original direction. So the new pass was correctly flipping the cx gate resulting in a different circuit that vf2 couldn't place anywhere better. To fix this the test sets a fixed layout on worse qubits so that vf2 will have to place it somewhere better. * For test_layout_tokyo_fully_connected_cx_4_3 from test.python.transpiler.test_preset_passmanagers.TestFinalLayouts the output circuit has a better estimated fidelity (although more gates in general). The transpiler output goes from an estimated fidelity of 0.9526614226294913 before the new pass was used to an estimated fidelity of 0.961996188569715 after the new pass is used. This new circuit with a better fidelity has a different initial layout set now, so the test is updated to use the new layout. * Remove unused imports

mtreinish added performance Changelog: Added Add an "Added" entry in the GitHub Release changelog. Rust This PR or issue is related to Rust code in the repository mod: transpiler Issues and PRs related to Transpiler labels Nov 10, 2024

mtreinish added this to the 2.0.0 milestone Nov 10, 2024

This was referenced Nov 11, 2024

Use OnceLock instead of OnceCell #13410

Merged

Suboptimal optimization result of consecutive RZZ gates with optimization level 2 and 3 #13431

Open

mtreinish force-pushed the two-qubit-peephole-parallel-pass branch from ad06d1a to 4d160bc Compare November 14, 2024 12:34

mtreinish mentioned this pull request Nov 18, 2024

Fix post-oxidization change in ConsolidateBlocks behavior #13450

Merged

mtreinish added 8 commits December 1, 2024 02:05

Merge remote-tracking branch 'origin/main' into two-qubit-peephole-pa…

decee9a

…rallel-pass

Merge remote-tracking branch 'origin/main' into two-qubit-peephole-pa…

4d4df68

…rallel-pass

Embed 2q gate count into score as tie breaker

cb6b70f

Also don't run scoring more than needed.

Release GIL during parallel portion

f06a070

Merge branch 'main' into two-qubit-peephole-parallel-pass

90b16e8

Fix lint

a175ee8

ShellyGarion mentioned this pull request Jan 23, 2025

Add 2q fractional gates to the UnitarySynthesis transpiler pass #13568

Merged

Merge remote-tracking branch 'origin/main' into two-qubit-peephole-pa…

af0c144

…rallel-pass

mtreinish added 4 commits January 26, 2025 09:28

Update ControlledUDecomposer to ensure we only run if the gate is con…

79a46c5

…tinuous

Add reversed synthesis for two qubit basis decomposer

839b4c9

Fix handling of single direction gates

d9399a6

Fix import cycle

b4c4360

ShellyGarion self-assigned this Feb 3, 2025

Merge branch 'main' into two-qubit-peephole-parallel-pass

aefdc90

1ucian0 assigned mtreinish Feb 6, 2025

Cryoris reviewed May 5, 2026

View reviewed changes

mtreinish added 5 commits May 5, 2026 13:33

Remove stray rust-analyzer comment

4f32ff3

Finish sentence in comment

1b15319

Fix unreachable text typo

eb24581

Add missing trivial recurse decorator from Python pass

924c3d6

Avoid unused scoring if outside_target is true

ff466a2

mtreinish requested a review from Cryoris May 5, 2026 20:00

ShellyGarion mentioned this pull request May 6, 2026

Improve single-qubit gate count in TwoQubitControlledUDecomposer #16123

Merged

3 tasks

mtreinish added 2 commits May 11, 2026 15:45

Merge remote-tracking branch 'origin/main' into two-qubit-peephole-pa…

576f85a

…rallel-pass

mtreinish requested a review from ShellyGarion May 15, 2026 20:57

Cryoris reviewed May 19, 2026

View reviewed changes

mtreinish and others added 2 commits May 19, 2026 06:55

Apply suggestions from code review

e02e638

Co-authored-by: Julien Gacon <gaconju@gmail.com>

Add return documentation to the internal unitary synthesis functions

70a8efd

mtreinish requested a review from Cryoris May 19, 2026 16:30

evmckinney9 mentioned this pull request May 19, 2026

[Feature Request]: Parallelize per-gate synthesis in GulpsDecompositionPass evmckinney9/gulps#15

Open

Cryoris reviewed May 21, 2026

View reviewed changes

Remove leftover pylint comments

dcd2a43

Cryoris approved these changes May 21, 2026

View reviewed changes

Cryoris added this pull request to the merge queue May 21, 2026

Merged via the queue into Qiskit:main with commit ef7004f May 21, 2026
27 checks passed

github-project-automation Bot moved this from In development to Done in Qiskit 2.5 May 21, 2026

mtreinish deleted the two-qubit-peephole-parallel-pass branch May 21, 2026 13:40

mtreinish mentioned this pull request May 27, 2026

Make heuristic for TwoQubitPeepholeOptimization configurable #16290

Open

3 tasks

Conversation

mtreinish commented Nov 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Details and comments

Uh oh!

coveralls commented Nov 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Coverage Report for CI Build 26225241457

Coverage decreased (-0.1%) to 87.488%

Details

Uncovered Changes

Coverage Regressions

Coverage Stats

💛 - Coveralls

Uh oh!

ShellyGarion commented Jan 23, 2025

Uh oh!

mtreinish commented Jan 26, 2025

Uh oh!

Cryoris left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Cryoris left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

mtreinish commented Nov 10, 2024 •

edited

Loading

coveralls commented Nov 10, 2024 •

edited

Loading