Skip to content

fix(waddrmgr): fix data race between Manager lock/unlock and ScopedKeyManager operations#1183

Open
Aharonee wants to merge 2 commits into
btcsuite:masterfrom
Aharonee:bugfix/manager_race
Open

fix(waddrmgr): fix data race between Manager lock/unlock and ScopedKeyManager operations#1183
Aharonee wants to merge 2 commits into
btcsuite:masterfrom
Aharonee:bugfix/manager_race

Conversation

@Aharonee

@Aharonee Aharonee commented Feb 22, 2026

Copy link
Copy Markdown

Summary

Fix a data race between Manager.Lock()/Unlock() and concurrent ScopedKeyManager operations such as address derivation.

The Race

Manager.lock() and Manager.Unlock() directly manipulate ScopedKeyManager fields (acctInfo, addrs, deriveOnUnlock) while holding only m.mtx. ScopedKeyManager methods access these same fields while holding only s.mtx. Since these are different mutexes, there is no mutual exclusion on the shared fields.

In production, this triggers when the walletLocker goroutine fires the unlock timeout (calling Manager.Lock()) while address derivation (NextExternalAddresses) is in progress on another goroutine.

The Fix

The root cause is that Manager was directly manipulating ScopedKeyManager's internal state instead of letting ScopedKeyManager manage its own fields. The fix moves responsibility for ScopedKeyManager's field lifecycle into ScopedKeyManager itself:

  • ScopedKeyManager gets Lock(), Unlock(), Close(), and ConvertToWatchingOnly() methods that encapsulate all field mutations behind s.mtx. Manager delegates to these methods instead of reaching into scoped manager fields. This ensures all access to the shared fields is consistently protected by s.mtx.

  • KeyManagerRoot interface: ScopedKeyManager.rootManager is now an interface instead of *Manager, preventing it from directly accessing Manager's mutex or internal fields. This removes the code paths where ScopedKeyManager previously acquired rootManager.mtx while holding s.mtx (in importPublicKey and importScriptAddress).

  • syncState moved to dedicated syncStateMtx: the only reason ScopedKeyManager previously acquired rootManager.mtx was to read/write syncState.startBlock. A separate mutex for this field, exposed via StartBlockHeight()/SetStartBlock() methods on the interface, removes the ordering conflict.

Reproducing the Race

Two tests are included:

  • TestLockUnlockRace (waddrmgr/manager_test.go): directly calls Manager.Lock()/Unlock() concurrently with NextExternalAddresses().
  • TestWalletLockerAddressRace (wallet/wallet_test.go): full wallet stack -- Wallet.Lock()/Unlock() through the walletLocker goroutine concurrently with Wallet.NewAddress().

To verify the race exists on unpatched master:

git checkout master
git checkout bugfix/manager_race -- waddrmgr/manager_test.go wallet/wallet_test.go
GORACE="halt_on_errors=0" go test -race -run "TestLockUnlockRace" -count=1 ./waddrmgr/
GORACE="halt_on_errors=0" go test -race -run "TestWalletLockerAddressRace" -count=1 ./wallet/

Both report multiple WARNING: DATA RACE on master. After the fix, both pass cleanly with -race.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a data race in waddrmgr by ensuring Manager.Lock()/Unlock() no longer mutate ScopedKeyManager internals without holding ScopedKeyManager’s mutex, and by separating sync-state locking to avoid lock-ordering problems during concurrent operations (e.g., address derivation vs wallet auto-lock).

Changes:

  • Add ScopedKeyManager-owned Lock/Unlock/Close/ConvertToWatchingOnly flows and delegate from Manager to avoid cross-mutex field access.
  • Introduce a dedicated syncStateMtx and route synced/start block access through it.
  • Add race-regression tests covering both direct manager lock/unlock and the full wallet stack.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
wallet/wallet_test.go Adds a wallet-level concurrent lock/unlock vs NewAddress race regression test.
waddrmgr/sync.go Moves SyncedTo state access to syncStateMtx instead of m.mtx.
waddrmgr/scoped_manager.go Introduces KeyManagerRoot interface and encapsulates scoped-manager lock/unlock/cleanup logic behind s.mtx.
waddrmgr/manager_test.go Adds a direct Manager.Lock()/Unlock() vs NextExternalAddresses() race regression test.
waddrmgr/manager.go Adds syncStateMtx, delegates lock/unlock to scoped managers, and adds root accessors + start block management.
waddrmgr/address.go Switches root manager field access to KeyManagerRoot methods for keys/params.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment thread waddrmgr/scoped_manager.go Outdated
CryptoKeyPub() EncryptorDecryptor
CryptoKeyPriv() EncryptorDecryptor
CryptoKeyScript() EncryptorDecryptor
StartBlockHeight() int32

Copilot AI Mar 5, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StartBlockHeight is included in KeyManagerRoot, but there are no call sites in the repo using it (all start block updates go through SetStartBlock). If it’s not needed, consider removing it to keep the interface minimal and reduce future refactor burden.

Suggested change
StartBlockHeight() int32

Copilot uses AI. Check for mistakes.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment thread waddrmgr/manager.go Outdated
Comment on lines +891 to +906
// SetStartBlock updates the start block if the given block is earlier than the
// current start block. It persists the change to the database and updates
// the in-memory state atomically.
func (m *Manager) SetStartBlock(ns walletdb.ReadWriteBucket,
bs *BlockStamp) error {

m.syncStateMtx.Lock()
defer m.syncStateMtx.Unlock()

if bs.Height < m.syncState.startBlock.Height {
if err := putStartBlock(ns, bs); err != nil {
return err
}
m.syncState.startBlock = *bs
}
return nil

Copilot AI Mar 5, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SetStartBlock accepts a *BlockStamp but dereferences it unconditionally (bs.Height) and will panic if a caller passes nil. Since this is a new exported method (and also part of the KeyManagerRoot interface), it should defensively handle nil (either as a no-op or by returning a typed error).

Copilot uses AI. Check for mistakes.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Callers guard with bs != nil before calling SetStartBlock, same as the original code did before the height comparison.

Comment thread waddrmgr/manager.go Outdated
Comment on lines +865 to +879
// CryptoKeyPub returns the public crypto key used to encrypt/decrypt public
// extended keys and addresses.
func (m *Manager) CryptoKeyPub() EncryptorDecryptor {
return m.cryptoKeyPub
}

// CryptoKeyPriv returns the private crypto key used to encrypt/decrypt private
// key material. The underlying key is zeroed when the manager is locked.
func (m *Manager) CryptoKeyPriv() EncryptorDecryptor {
return m.cryptoKeyPriv
}

// CryptoKeyScript returns the script crypto key used to encrypt/decrypt script
// data. The underlying key is zeroed when the manager is locked.
func (m *Manager) CryptoKeyScript() EncryptorDecryptor {

Copilot AI Mar 5, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new exported Manager methods CryptoKeyPub/CryptoKeyPriv/CryptoKeyScript expose the wallet’s in-memory encryption keys via the EncryptorDecryptor interface (including Bytes/CopyBytes). This significantly expands the public API surface and allows external callers to read/modify secret key material, which is a security risk and breaks encapsulation.

Consider making these methods (and KeyManagerRoot) unexported/package-private, or returning a narrower internal-only interface that does not expose raw key bytes or mutation operations.

Suggested change
// CryptoKeyPub returns the public crypto key used to encrypt/decrypt public
// extended keys and addresses.
func (m *Manager) CryptoKeyPub() EncryptorDecryptor {
return m.cryptoKeyPub
}
// CryptoKeyPriv returns the private crypto key used to encrypt/decrypt private
// key material. The underlying key is zeroed when the manager is locked.
func (m *Manager) CryptoKeyPriv() EncryptorDecryptor {
return m.cryptoKeyPriv
}
// CryptoKeyScript returns the script crypto key used to encrypt/decrypt script
// data. The underlying key is zeroed when the manager is locked.
func (m *Manager) CryptoKeyScript() EncryptorDecryptor {
// cryptoKeyPubKey returns the public crypto key used to encrypt/decrypt
// public extended keys and addresses. This method is unexported to avoid
// exposing the in-memory encryption key outside the package.
func (m *Manager) cryptoKeyPubKey() EncryptorDecryptor {
return m.cryptoKeyPub
}
// cryptoKeyPrivKey returns the private crypto key used to encrypt/decrypt
// private key material. The underlying key is zeroed when the manager is
// locked. This method is unexported to avoid exposing the in-memory
// encryption key outside the package.
func (m *Manager) cryptoKeyPrivKey() EncryptorDecryptor {
return m.cryptoKeyPriv
}
// cryptoKeyScriptKey returns the script crypto key used to encrypt/decrypt
// script data. The underlying key is zeroed when the manager is locked. This
// method is unexported to avoid exposing the in-memory encryption key
// outside the package.
func (m *Manager) cryptoKeyScriptKey() EncryptorDecryptor {

Copilot uses AI. Check for mistakes.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before this PR, ScopedKeyManager accessed s.rootManager.cryptoKeyPriv directly (22 times across scoped_manager.go and address.go). The method call returns the same reference. The EncryptorDecryptor interface was already public. No new security surface is introduced.

Comment thread waddrmgr/scoped_manager.go Outdated
Comment on lines +261 to +275
// KeyManagerRoot is the interface through which ScopedKeyManager accesses
// the root Manager. It exposes only what is needed: lock state queries,
// crypto keys for encryption/decryption, chain parameters, and sync state
// management. This prevents ScopedKeyManager from directly accessing
// Manager's mutex or internal fields.
type KeyManagerRoot interface {
IsLocked() bool
WatchOnly() bool
ChainParams() *chaincfg.Params
CryptoKeyPub() EncryptorDecryptor
CryptoKeyPriv() EncryptorDecryptor
CryptoKeyScript() EncryptorDecryptor
StartBlockHeight() int32
SetStartBlock(ns walletdb.ReadWriteBucket, bs *BlockStamp) error
}

Copilot AI Mar 5, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

KeyManagerRoot is exported and its methods are exported, which makes this new interface part of the package’s public API. Since it’s intended purely as an internal dependency boundary between Manager and ScopedKeyManager, consider making the interface unexported (and the methods unexported) to avoid committing to it as a stable external API.

Copilot uses AI. Check for mistakes.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@yyforyongyu yyforyongyu added this to the Introduce SQL store milestone Mar 5, 2026
Manager.lock() and Manager.Unlock() directly accessed ScopedKeyManager
fields (acctInfo, addrs, deriveOnUnlock) while holding only m.mtx.
ScopedKeyManager methods accessed those same fields while holding only
s.mtx. Since these are different mutexes, there was no mutual exclusion
on the shared fields.

The fix moves responsibility for ScopedKeyManager's field lifecycle
into ScopedKeyManager itself via Lock(), Unlock(), Close(), and
ConvertToWatchingOnly() methods. Each acquires s.mtx internally,
ensuring all access to the shared fields is consistently protected
by s.mtx. Manager delegates to these methods instead of manipulating
fields directly.

A KeyManagerRoot interface replaces the *Manager back-reference in
ScopedKeyManager, preventing it from directly accessing Manager's
mutex or internal fields.

syncState is moved to a dedicated syncStateMtx, and
FetchScopedKeyManager is made lock-free (the scopedManagers map is
stable after wallet initialization), removing all s.mtx -> m.mtx
lock ordering violations.
@Aharonee Aharonee force-pushed the bugfix/manager_race branch from 0cbe47a to 94ca3cb Compare March 6, 2026 20:17
Fix all 41 golangci-lint issues: add blank lines for wsl_v5, use
plain error assignment for noinlineerr, break long lines for lll,
wrap external/interface errors for wrapcheck, add //nolint:ireturn
for interface-returning accessors, use integer range syntax, add
t.Parallel(), and move unexported method after exported ones for
funcorder.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants