Make statistics optional#977
Conversation
|
Hi @roman-khimov. Thanks for your PR. I'm waiting for a etcd-io member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
Your original PR have multiple change. I am not sure how significant this single change has on the performance. Do you have any data for comparison? |
|
It's all in the commit message, abb5797 |
|
I can also add nspcc-dev@53975e3 here, it's not related to the original problem (reader lock contention), but still can be beneficial for writers. |
|
I am a little confused, in your test case, there is 30ms sleep in each readonly TXN, so any duration should be >= 30ms, but you got some XX |
|
It's not ms, it's us, in the code (56bb52b) that's |
Ah, thx for the clarification. |
| // Put `stats` at the first field to ensure it's 64-bit aligned. Note that | ||
| // the first word in an allocated struct can be relied upon to be 64-bit | ||
| // aligned. Refer to https://pkg.go.dev/sync/atomic#pkg-note-BUG. Also | ||
| // refer to discussion in https://github.com/etcd-io/bbolt/issues/577. | ||
| stats Stats |
There was a problem hiding this comment.
sorry to say this, but did you read the comment? :)
There was a problem hiding this comment.
sorry to say this, but did you read the comment? :)
It changes it to a pointer. So I think it should be fine.
There was a problem hiding this comment.
do you guys have a raspi or any other arm32 device sitting around to validate this? I'm not so worried about the stats pointer, but rather the sync.Once or sync.Pool could become problematic.
There was a problem hiding this comment.
Just quickly ran the tests on your branch on my pi - so everything works as intended. Sorry I'm in the middle of a move, luckily I had it plugged in and could ssh into it. Not sure whether you have your QEMU setup still somewhere...
There was a problem hiding this comment.
do you guys have a raspi or any other arm32 device sitting around to validate this?
qemu-user-static might be good enough to verify this. #577 (comment)
I think we might need to setup a workflow to guarantee it won't be broken in future.
but rather the
sync.Onceorsync.Poolcould become problematic.
The known issue https://pkg.go.dev/sync/atomic#pkg-note-BUG seems only specific to sync.atomic package. But I agree that we should verify it anyway.
There was a problem hiding this comment.
I think we might need to setup a workflow to guarantee it won't be broken in future.
cc @ivanvc
There was a problem hiding this comment.
I think we might need to setup a workflow to guarantee it won't be broken in future.
On it.
|
generally looks good to me and I would even vouch for changing the default to disable it and reap the benefit in etcd - but this needs a bit more refactor over there, we use the |
thus was my question above 👍🏽 |
Works for me if there is a general consensus on this. It's just that the intention was to keep the default behavior unchanged. Otherwise I think half a year after release with this change someone will come here and tell that
|
|
Summary & comments:
|
| // NoStatistics turns off statistics collection, Stats method will | ||
| // return empty structure in this case. This can be beneficial for | ||
| // performance in some cases. |
There was a problem hiding this comment.
| // NoStatistics turns off statistics collection, Stats method will | |
| // return empty structure in this case. This can be beneficial for | |
| // performance in some cases. | |
| // NoStatistics turns off statistics collection, Stats method will | |
| // return empty structure in this case. This can be beneficial for | |
| // performance under high-concurrency read-only transactions. |
I think most Bolt users never care about this data, so we're just wasting time for nothing. This is also one of the exclusive locks that we have on the View() path. While this patch doesn't change much on its own, because the other lock is still here (subject to a different patch), once that lock is removed the difference in concurrent View() test is pretty clear. With NoStatistics=false: workers samples min avg 50% 80% 90% max 1 10 123.905µs 969.042µs 1.062529ms 1.065585ms 1.071537ms 1.071537ms 10 100 34.636µs 178.176µs 89.7µs 110.439µs 943.753µs 1.055165ms 100 1000 31.79µs 280.166µs 51.358µs 526.992µs 1.034306ms 2.47819ms 1000 10000 30.608µs 818.098µs 86.464µs 935.799µs 2.681115ms 10.595186ms 10000 100000 30.569µs 3.060826ms 64.132µs 6.56151ms 11.199984ms 64.855384ms NoStatistics=true: workers samples min avg 50% 80% 90% max 1 10 68.049µs 962.039µs 1.060335ms 1.064633ms 1.066087ms 1.066087ms 10 100 34.846µs 315.346µs 90.943µs 862.499µs 1.00516ms 1.08366ms 100 1000 31.45µs 225.53µs 36.88µs 236.63µs 939.115µs 1.466286ms 1000 10000 30.539µs 207.383µs 43.643µs 110.841µs 408.146µs 5.689001ms 10000 100000 30.488µs 152.603µs 39.636µs 90.622µs 145.266µs 9.28235ms The default behavior is kept for compatibility. In future the option can be extended to avoid collecting transaction statistics as well. Now that stats is a pointer we can also revert a part of 26f89a5 and make the structure cleaner. Signed-off-by: Roman Khimov <roman@nspcc.ru>
abb5797 to
87d0cf7
Compare
Works fine as in #577 (comment) |
| tx.db.statlock.Unlock() | ||
| if tx.db.stats != nil { | ||
| tx.db.statlock.Lock() | ||
| tx.db.stats.FreePageN = freelistFreeN |
There was a problem hiding this comment.
I was thinking if we can use atomic to update stats here. so no more lock and no more extra option for openning db.
anyway, changes look good.
There was a problem hiding this comment.
I was thinking if we can use atomic to update stats here
I think using atomic functions is problematic, since it only guarantees atomic of one operation, but it doesn't guarantee atomic of multiple operations.
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ahrtr, fuweid, roman-khimov The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
* engine: reduce bbolt freelist memory usage Marcos has seen severe memory usage during some dagger generate commands that do not reproduce on my machine: 30GB+ RSS versus around 800MB. The heap profile points at bbolt's internal freelist mergeSpans path, with more than 100GB of allocations over one run. bbolt PR #1179 improves hashmap freelist span merging: etcd-io/bbolt#1179 Keep using that main-branch bbolt commit and switch Dagger's bbolt opens to FreelistMapType so we actually use the hashmap path. This covers the Dagger-managed snapshotter metastore and containerd metadata DB. It also sets bolt.DefaultOptions.FreelistType before snapshotter construction because some containerd snapshotters open bbolt internally with nil options. Upgrade containerd to v2.2.3 while here. The previous v2.0.8 release does not compile against opencontainers/runtime-spec v1.3.0: it assigns an int64 to LinuxPids.Limit, which is now *int64. v2.2.x supports the new runtime-spec API. containerd v2.2.x requires go.etcd.io/bbolt v1.4.3, which is a higher semver than the main-branch pseudo-version containing PR #1179. The go.mod replace keeps the selected bbolt code on that main commit even though MVS selects v1.4.3. Signed-off-by: Erik Sipsma <erik@sipsma.dev> * engine: disable bbolt statistics bbolt's main branch now has Options.NoStatistics, added by etcd-io/bbolt#977, to skip DB-level statistics collection. We do not consume these stats for the engine metadata databases, and bbolt notes this can improve performance under high-concurrency read-only transactions. Set NoStatistics on the explicit engine bbolt opens and in the process defaults used by containerd snapshotter paths that open bbolt internally with nil options. Signed-off-by: Erik Sipsma <erik@sipsma.dev> * chore: update x/sys module versions Keep the repository module files in sync with the dependency solver now selecting golang.org/x/sys v0.44.0. This removes the generated check diff across module fixtures, SDK runtimes, and toolchains. Signed-off-by: Erik Sipsma <erik@sipsma.dev> * chore: update dagger module x/sys version Update the hidden .dagger module files to match the golang.org/x/sys v0.44.0 selection used by the rest of the repository modules. Signed-off-by: Erik Sipsma <erik@sipsma.dev> --------- Signed-off-by: Erik Sipsma <erik@sipsma.dev>
* engine: reduce bbolt freelist memory usage Marcos has seen severe memory usage during some dagger generate commands that do not reproduce on my machine: 30GB+ RSS versus around 800MB. The heap profile points at bbolt's internal freelist mergeSpans path, with more than 100GB of allocations over one run. bbolt PR #1179 improves hashmap freelist span merging: etcd-io/bbolt#1179 Keep using that main-branch bbolt commit and switch Dagger's bbolt opens to FreelistMapType so we actually use the hashmap path. This covers the Dagger-managed snapshotter metastore and containerd metadata DB. It also sets bolt.DefaultOptions.FreelistType before snapshotter construction because some containerd snapshotters open bbolt internally with nil options. Upgrade containerd to v2.2.3 while here. The previous v2.0.8 release does not compile against opencontainers/runtime-spec v1.3.0: it assigns an int64 to LinuxPids.Limit, which is now *int64. v2.2.x supports the new runtime-spec API. containerd v2.2.x requires go.etcd.io/bbolt v1.4.3, which is a higher semver than the main-branch pseudo-version containing PR #1179. The go.mod replace keeps the selected bbolt code on that main commit even though MVS selects v1.4.3. Signed-off-by: Erik Sipsma <erik@sipsma.dev> * engine: disable bbolt statistics bbolt's main branch now has Options.NoStatistics, added by etcd-io/bbolt#977, to skip DB-level statistics collection. We do not consume these stats for the engine metadata databases, and bbolt notes this can improve performance under high-concurrency read-only transactions. Set NoStatistics on the explicit engine bbolt opens and in the process defaults used by containerd snapshotter paths that open bbolt internally with nil options. Signed-off-by: Erik Sipsma <erik@sipsma.dev> * chore: update x/sys module versions Keep the repository module files in sync with the dependency solver now selecting golang.org/x/sys v0.44.0. This removes the generated check diff across module fixtures, SDK runtimes, and toolchains. Signed-off-by: Erik Sipsma <erik@sipsma.dev> * chore: update dagger module x/sys version Update the hidden .dagger module files to match the golang.org/x/sys v0.44.0 selection used by the rest of the repository modules. Signed-off-by: Erik Sipsma <erik@sipsma.dev> --------- Signed-off-by: Erik Sipsma <erik@sipsma.dev>
See #967 for motivation and test used.