Pivot to foldhash for hash implementation in IndexMap#15920
Conversation
This commit switches our use of ahash as the hashing algorithm for IndexMap to use foldhash instead. This mirrors the change that was done in hashbrown for the default hashing algorithm in its hashmap. Foldhash promises faster hashing than ahash with similar quality tradeoffs. Since IndexSet/IndexMap defaults to using the stdlib SipHash algorithm which is not good for performance we need to set a custom hash algorithm to maintain good performance. Since we use an IndexSet for the interners saving a small amount of time per lookup here is especially important. The only place that ahash is retained is for DAGNode in Python. The usage was specific to how ahash worked and switching it to an analogous foldhash interface caused test failures. As the performance is not critical for this code path as it's only used by the Python API for accessing the DAG and it's already not a performant path saving nanoseconds on hashing doesn't matter. There is probably an alternative API in foldhash we could use, but it didn't seem critical to figure out.
|
One or more of the following people are relevant to this code:
|
|
Have you got benchmarks showing the improvement? Should we be concerned at all about the lack of DoS resistance in |
|
I'm running benchmarks now. Initially testing is showing a small speedup on some benchmarks but also corresponding small regressions on others on my amd linux system. Although I'm a bit worried about system noise on these numbers so I will re-run it tomorrow on an idle system. I also want to test on different platforms too because I think there might be more of a benefit on some platforms (thinking arm mac mostly). I'm not worried about potential decreased HashDoS resistance since we're only using these hashmaps internally and not in a way where hash collision attacks would do anything except cause a small slowdown in lookups internally, and not to a degree where it's a potential DoS vector beyond already having a massive circuit. They're also mostly not used in a place with arbitrary user data, the only places where there is even a potential vector from this PR is gate names in some transpiler passes internal caching the rest are all just bit indices. None of the usage here is for cryptographic hashing or verification, it's just for internal data structures caching primarily. I'm also not really worried because we're already using foldhash internally everywhere we use |
|
The full asv run finished before I expected. It's looking like a better improvement on an idle system: |
Coverage Report for CI Build 25324433408Warning Build has drifted: This PR's base is out of sync with its target branch, so coverage data may include unrelated changes. Coverage increased (+0.01%) to 87.581%Details
Uncovered Changes
Coverage Regressions5 previously-covered lines in 3 files lost coverage.
Coverage Stats
💛 - Coveralls |
raynelfss
left a comment
There was a problem hiding this comment.
There are places where we're still using the default hasher with indexmap, mainly in:
VarStretchContainerexporter.rs(Qasm 3)suzuki_trotter.rs.alapandasapschedule analysis passes.
And other files I haven't checked...
That said, I checked what #16130 does and, as it introduces aliases with these hashers already included as part of its implementation, these files should already be covered so I think this can merge as is.
It's interesting how just using a different hasher can speed things up considerably. This is a good addition. Happy merging 🚀
Building off of Qiskit#15920 which moved our hasher for IndexMap usage to be foldhash this moves to having a common type alias for IndexMap and IndexSet that has the hasher set. All the internal usage of IndexMap and IndexSet are updated to use this new alias instead which removes the boiler plate around dealing with initializing the hasher for the most part, the only exception is when using `with_capacity_and_hasher()` there is no method to create an empty object with a capacity preallocated and the hasher is initialized from the generic type. To enforce that we're using the correct hasher for all indexmap usage the clippy rules are updated to disallow the indexmap::IndexMap and indexmap::IndexSet types directly. This means that after this commit all indexmap usage with be forced by clippy to leverage our standard pattern using foldhash. The advantage is if we ever need to change the default hasher again in the future we only need to update one central place (and the places using with_capacity_and_hasher).
…iskit#16130) Building off of Qiskit#15920 which moved our hasher for IndexMap usage to be foldhash this moves to having a common type alias for IndexMap and IndexSet that has the hasher set. All the internal usage of IndexMap and IndexSet are updated to use this new alias instead which removes the boiler plate around dealing with initializing the hasher for the most part, the only exception is when using `with_capacity_and_hasher()` there is no method to create an empty object with a capacity preallocated and the hasher is initialized from the generic type. To enforce that we're using the correct hasher for all indexmap usage the clippy rules are updated to disallow the indexmap::IndexMap and indexmap::IndexSet types directly. This means that after this commit all indexmap usage with be forced by clippy to leverage our standard pattern using foldhash. The advantage is if we ever need to change the default hasher again in the future we only need to update one central place (and the places using with_capacity_and_hasher).
Summary
This commit switches our use of ahash as the hashing algorithm for IndexMap to use foldhash instead. This mirrors the change that was done in hashbrown for the default hashing algorithm in its hashmap. Foldhash promises faster hashing than ahash with similar quality tradeoffs. Since IndexSet/IndexMap defaults to using the stdlib SipHash algorithm which is not good for performance we need to set a custom hash algorithm to maintain good performance. Since we use an IndexSet for the interners saving a small amount of time per lookup here is especially important.
The only place that ahash is retained is for DAGNode in Python. The usage was specific to how ahash worked and switching it to an analogous foldhash interface caused test failures. As the performance is not critical for this code path as it's only used by the Python API for accessing the DAG and it's already not a performant path saving nanoseconds on hashing doesn't matter. There is probably an alternative API in foldhash we could use, but it didn't seem critical to figure out.
Details and comments