Move ParameterVector to Rust#16228
Draft
jakelishman wants to merge 4 commits into
Draft
Conversation
Currently, there's no way to control the UUID of `ParameterVector`, which is at odds with `Parameter` itself. QPY has to jump through several non-public hoops to influence the UUIDs of created elements, though this is an interface we natively support elsewhere. This commit is just the new feature. A follow up with switch QPY over to using it; that is a more involved change that needs separate review because it may lead to a relaxing of the "vector was not constructed equally" warning at the same time.
When QPY deserialises `ParameterVectorElement`s, it first created a dummy `ParameterVectorElement` with a random UUID, then mutated an internal Python-space cache of the elements individually, as they were encountered, to reset them to the `root_uuid + offset` form. The idea that we _can_ find the root vector during deserialisation is predicated on the assumption that the `root_uuid + offset` form _must_ hold; if it doesn't, then the attempt to look up a cached version of the vector by looking at the `element_uuid - offset` would fail, so we'd never return the same vector. When `ParameterVectorElement` serialisation was introduced in QPY v3[^1], vectors were still being looked up by name, and the elements had completely disconnected UUIDs, which motivated the original system. This is no longer the case on both counts; it is now much easier simply to generate the vector correctly from the start, and never need to mutate internal private state. [^1]: 5c4cd2b: Fix ParameterVector handling in QPY serialization (Qiskitgh-7381)
This warning was added with QPY v3[^1] at a time when `ParameterVector` generated all its elements with unrelated UUIDs. Since `qiskit-terra` v0.25, we actually _have_ had enough information to completely recreate an entire vector from a single element[^2], and the circumstances needed to observe the incomplete behaviour are particularly convoluted, and unlikely in practice. [^1]: 5c4cd2b: Fix ParameterVector handling in QPY serialization (Qiskitgh-7381) [^2]: 5ab231d: Slightly optimize ParameterVector construction (Qiskitgh-10403)
`ParameterVector` was previously a pure-Python concept, even though the
Rust-space `Symbol` necessarily had to track an internal backreference
to a vector Python object in order to convert itself to Python later.
This hidden dependence on the Python interpreter was the root cause of a
panicking bug in Qiskit 2.4[^1].
All `ParameterVectorElement`s in Python space have names and UUIDs that
are pure functions of the underlying vector and the element's index.
Now that this information is available in Rust space, it is far more
efficient to simply not store all this derived information, but
calculate it on the fly, as needed. This motivated the change in
`Symbol` to an `enum`: the logic was _already_ handling the two cases of
"standalone" and "element", but this formally separates them to avoid
storing unnecessary data. Doing this alone is already a performance
benefit: construction of a `list(ParameterVector("a", 10_000))` (to
ensure all the same Python-space objects are created) went from 7.7ms
to 1.4ms on my machine (~5x speedup).
This work here is far more aggressive at threading through shared `Arc`
references of both `Symbol` and `SymbolVector`. The `Arc<SymbolVector>`
is necessary for correctness in Python space, not just performance: we
must have `all(el.vector == els[0].vector for el in els)`, and we don't
want to have to cache more Python objects within `Symbol` to achieve
that. Unfortunately, `ParameterVector` never implemented true equality
(only the default referential equality that Python uses to make
mutable-state objects safely hashable by default), so we certainly need
some other stable reference that can be shared.
The previous Python-space constructions of `PyParameter` and
`PyParameterVectorElement` had not been done entirely correctly with
respect to subclassing; the same (cloned) `Symbol` was present in
multiple places, which was two more allocations than necessary (most
places in Rust use `Arc<Symbol>`, which can be near-freely cloned).
This corrects it so that `PyParameterVectorElement` is now just a simple
marker type, and the majority of the Rust logic needn't concern itself
with it at all.
The `AtomicUsize` used for the length of `SymbolVector` is to support
the unfortunate `ParameterVector.resize` operation from Python space.
The atomic operation makes the whole logic safe from data races, but
does nothing to solve the potential race condition of two places
querying a vector its length concurrently and getting different answers,
or the problem that shortening the vector can invalidate already-held
references to `ParameterVectorElement`s or `Symbol`s (a problem that was
pre-existing in Python). The `resize` in-place method is fairly
fundamentally unsound for these reasons, and we should consider removing
it; it's only used in the deprecated `NLocal` subclass of
`BlueprintCircuit`, and that use can probably be replaced by a simple
lazier construction of the vector, or a complete object replacement
rather than mutation.
[^1]: 16c8088: Fix panic in `RemoveIdentityEquivalent` with `ParameterVector` global phase (Qiskitgh-16054)
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
ParameterVectorwas previously a pure-Python concept, even though the Rust-spaceSymbolnecessarily had to track an internal backreference to a vector Python object in order to convert itself to Python later. This hidden dependence on the Python interpreter was the root cause of a panicking bug in Qiskit 2.41.All
ParameterVectorElements in Python space have names and UUIDs that are pure functions of the underlying vector and the element's index. Now that this information is available in Rust space, it is far more efficient to simply not store all this derived information, but calculate it on the fly, as needed. This motivated the change inSymbolto anenum: the logic was already handling the two cases of "standalone" and "element", but this formally separates them to avoid storing unnecessary data. Doing this alone is already a performance benefit: construction of alist(ParameterVector("a", 10_000))(to ensure all the same Python-space objects are created) went from 7.7ms to 1.4ms on my machine (~5x speedup).This work here is far more aggressive at threading through shared
Arcreferences of bothSymbolandSymbolVector. TheArc<SymbolVector>is necessary for correctness in Python space, not just performance: we must haveall(el.vector == els[0].vector for el in els), and we don't want to have to cache more Python objects withinSymbolto achieve that. Unfortunately,ParameterVectornever implemented true equality (only the default referential equality that Python uses to make mutable-state objects safely hashable by default), so we certainly need some other stable reference that can be shared.The previous Python-space constructions of
PyParameterandPyParameterVectorElementhad not been done entirely correctly with respect to subclassing; the same (cloned)Symbolwas present in multiple places, which was two more allocations than necessary (most places in Rust useArc<Symbol>, which can be near-freely cloned). This corrects it so thatPyParameterVectorElementis now just a simple marker type, and the majority of the Rust logic needn't concern itself with it at all.The
AtomicUsizeused for the length ofSymbolVectoris to support the unfortunateParameterVector.resizeoperation from Python space. The atomic operation makes the whole logic safe from data races, but does nothing to solve the potential race condition of two places querying a vector its length concurrently and getting different answers, or the problem that shortening the vector can invalidate already-held references toParameterVectorElements orSymbols (a problem that was pre-existing in Python). Theresizein-place method is fairly fundamentally unsound for these reasons, and we should consider removing it; it's only used in the deprecatedNLocalsubclass ofBlueprintCircuit, and that use can probably be replaced by a simple lazier construction of the vector, or a complete object replacement rather than mutation.Depends on #16222
Fix #16208
AI/LLM disclosure
Footnotes
16c8088: Fix panic in
RemoveIdentityEquivalentwithParameterVectorglobal phase (Fix panic inRemoveIdentityEquivalentwithParameterVectorglobal phase #16054) ↩