Skip to content

tbc: store tx byte location in tx index, bump DB to v6#1052

Open
marcopeereboom wants to merge 2 commits into
marco/lazy-blockfrom
marco/tx-offset
Open

tbc: store tx byte location in tx index, bump DB to v6#1052
marcopeereboom wants to merge 2 commits into
marco/lazy-blockfrom
marco/tx-offset

Conversation

@marcopeereboom
Copy link
Copy Markdown
Contributor

Summary

Store tx byte location (TxLoc: offset + length within raw block) in the tx index 't' entry value, which was previously nil. This enables O(1) tx lookup by jumping directly to the tx's bytes in the raw block — no scanning, no SHA256 hashing, no full block deserialization.

Depends on #1051 (lazy block reader).

Problem

Every tx lookup via BlockHashByTxId requires a subsequent full block deserialization to find the tx. The 't' entry value was nil — wasted space that could carry the byte offset. CPU profile shows 60% in SHA256 hashing from FindTx scanning every tx in the block to find a match.

Solution

  • BlockHashByTxId signature changed to return (*chainhash.Hash, wire.TxLoc, error) — all callers updated
  • processTxs now calls block.TxLoc() and stores offset+length via NewTxMappingWithLoc
  • BlockTxUpdate uses stack-allocated reusable buffers instead of slicing loop variables (addresses potential data integrity issue documented in tbc: tx index intermittently loses entries during IBD #1050)
  • All consumers wired to use TxLoc when available, with legacy fallback for nil values:
    • TxById (RPC) — deserializes only the target tx
    • txOutFromOutPoint (UTXO unwind) — deserializes only the target tx
    • handleBlockHashByTxIdRequest (RPC) — hash only, ignores TxLoc
    • hemictl — hash only, ignores TxLoc
  • DB version 5 → 6, upgrade wipes tx index for rebuild with TxLoc values
  • Errors from block.TxLoc() logged at Errorf, falls back to nil values

Testing

  • TestDbUpgradeV6 — seeds v5 DB, runs upgrade, verifies index wiped and version bumped
  • TestTxLocRoundTrip — stores TxLoc, reads back, verifies offset+length match
  • TestTxLocOffsetCorrectness — stores raw block, uses offset to extract tx bytes, deserializes, verifies txid and output values match
  • TestTxLocLegacyNilValue — nil-value entry returns zero TxLoc gracefully

Impact

With TxLoc, each cache miss costs: 1 LevelDB read (tx index) + 1 LevelDB/cache read (raw block) + parse ~200 bytes of one tx. No full block deserialization. No SHA256 scanning. Eliminates the need for parallel lookup strategies.

Related

Files changed

  • database/tbcd/database.goBlockHashByTxId returns TxLoc, NewTxMappingWithLoc
  • database/tbcd/level/level.go — implementation, stack buffers, DB v6
  • database/tbcd/level/level_test.go — 4 new tests
  • database/tbcd/level/upgrade.gov6() wipes tx index
  • service/tbc/txindex.go — stores TxLoc
  • service/tbc/tbc.goTxById uses TxLoc
  • service/tbc/utxoindex.gotxOutFromOutPoint uses TxLoc
  • service/tbc/rpc.go — caller updated
  • service/tbc/cpfp_test.go — stub updated
  • service/tbc/tbc_test.go — version expectations updated
  • cmd/hemictl/hemictl.go — caller updated

@marcopeereboom marcopeereboom requested a review from a team as a code owner May 29, 2026 07:10
BlockByHash calls btcutil.NewBlockFromBytes on every read, eagerly
deserializing the entire block into heap objects. Callers needing
one tx pay for all txs. CPU profiles show 50% GC pressure from this.

Add lazyBlock type wrapping raw []byte from the block cache with
lazy per-tx access — no deserialization until a specific tx is
requested. Single-pass boundary scan finds tx offsets without
parsing. Per-tx txid computation handles both witness and non-witness
serialization. Per-tx output value extraction reads only the outputs
section.

Add BlockRawByHash to the DB interface — the cache-check + LevelDB
read path from BlockByHash without the NewBlockFromBytes call.

Existing BlockByHash callers are unchanged — this is an opt-in
parallel path for callers that need lightweight access.

100% test coverage on lazyblock.go. Every method cross-checked
against btcutil.NewBlockFromBytes output as oracle. Test blocks
include: genesis, segwit single/multi input, mixed segwit/non-segwit,
50-tx blocks, empty witness items, large inscription-style witness,
FullBlock byte-identical round-trip, and exhaustive error path
coverage for all truncation boundaries.
return l.MetadataPut(ctx, versionKey, v)
}

func (l *ldb) v6(ctx context.Context) error {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How long does this upgrade take?

Store TxLoc (offset + length within raw block) in the t entry value
instead of nil. This allows callers to jump directly to a tx's bytes
in the raw block without scanning — O(1) instead of O(txs_in_block).

BlockHashByTxId now returns (*chainhash.Hash, wire.TxLoc, error).
All callers updated. No separate method needed — callers that only
need the hash use bh, _, err := BlockHashByTxId(...).

processTxs calls block.TxLoc() and stores the location via
NewTxMappingWithLoc. Errors from TxLoc() are logged at Errorf
and the indexer falls back to nil values (legacy format).

BlockTxUpdate uses stack-allocated reusable buffers instead of
slicing loop variables. The previous code sliced the range variable
and passed the slice to leveldb.Batch.Put. appendRec copies
immediately, but the interaction between range variable reuse,
map deletion, and GC is not guaranteed safe. Stack buffers are
zero-alloc and independent per iteration.

DB version 5 -> 6. Upgrade path wipes the transactions index for
rebuild with TxLoc values. The index is fully derived from block data.

Ref: #1050
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants