tbc: store tx byte location in tx index, bump DB to v6 by marcopeereboom · Pull Request #1052 · hemilabs/heminetwork

marcopeereboom · 2026-05-29T07:10:03Z

Summary

Store tx byte location (TxLoc: offset + length within raw block) in the tx index 't' entry value, which was previously nil. This enables O(1) tx lookup by jumping directly to the tx's bytes in the raw block — no scanning, no SHA256 hashing, no full block deserialization.

Depends on #1051 (lazy block reader).

Problem

Every tx lookup via BlockHashByTxId requires a subsequent full block deserialization to find the tx. The 't' entry value was nil — wasted space that could carry the byte offset. CPU profile shows 60% in SHA256 hashing from FindTx scanning every tx in the block to find a match.

Solution

BlockHashByTxId signature changed to return (*chainhash.Hash, wire.TxLoc, error) — all callers updated
processTxs now calls block.TxLoc() and stores offset+length via NewTxMappingWithLoc
BlockTxUpdate uses stack-allocated reusable buffers instead of slicing loop variables (addresses potential data integrity issue documented in tbc: tx index intermittently loses entries during IBD #1050)
All consumers wired to use TxLoc when available, with legacy fallback for nil values:
- TxById (RPC) — deserializes only the target tx
- txOutFromOutPoint (UTXO unwind) — deserializes only the target tx
- handleBlockHashByTxIdRequest (RPC) — hash only, ignores TxLoc
- hemictl — hash only, ignores TxLoc
DB version 5 → 6, upgrade wipes tx index for rebuild with TxLoc values
Errors from block.TxLoc() logged at Errorf, falls back to nil values

Testing

TestDbUpgradeV6 — seeds v5 DB, runs upgrade, verifies index wiped and version bumped
TestTxLocRoundTrip — stores TxLoc, reads back, verifies offset+length match
TestTxLocOffsetCorrectness — stores raw block, uses offset to extract tx bytes, deserializes, verifies txid and output values match
TestTxLocLegacyNilValue — nil-value entry returns zero TxLoc gracefully

Impact

With TxLoc, each cache miss costs: 1 LevelDB read (tx index) + 1 LevelDB/cache read (raw block) + parse ~200 bytes of one tx. No full block deserialization. No SHA256 scanning. Eliminates the need for parallel lookup strategies.

Files changed

database/tbcd/database.go — BlockHashByTxId returns TxLoc, NewTxMappingWithLoc
database/tbcd/level/level.go — implementation, stack buffers, DB v6
database/tbcd/level/level_test.go — 4 new tests
database/tbcd/level/upgrade.go — v6() wipes tx index
service/tbc/txindex.go — stores TxLoc
service/tbc/tbc.go — TxById uses TxLoc
service/tbc/utxoindex.go — txOutFromOutPoint uses TxLoc
service/tbc/rpc.go — caller updated
service/tbc/cpfp_test.go — stub updated
service/tbc/tbc_test.go — version expectations updated
cmd/hemictl/hemictl.go — caller updated

BlockByHash calls btcutil.NewBlockFromBytes on every read, eagerly deserializing the entire block into heap objects. Callers needing one tx pay for all txs. CPU profiles show 50% GC pressure from this. Add lazyBlock type wrapping raw []byte from the block cache with lazy per-tx access — no deserialization until a specific tx is requested. Single-pass boundary scan finds tx offsets without parsing. Per-tx txid computation handles both witness and non-witness serialization. Per-tx output value extraction reads only the outputs section. Add BlockRawByHash to the DB interface — the cache-check + LevelDB read path from BlockByHash without the NewBlockFromBytes call. Existing BlockByHash callers are unchanged — this is an opt-in parallel path for callers that need lightweight access. 100% test coverage on lazyblock.go. Every method cross-checked against btcutil.NewBlockFromBytes output as oracle. Test blocks include: genesis, segwit single/multi input, mixed segwit/non-segwit, 50-tx blocks, empty witness items, large inscription-style witness, FullBlock byte-identical round-trip, and exhaustive error path coverage for all truncation boundaries.

joshuasing · 2026-05-29T10:23:30Z

 	return l.MetadataPut(ctx, versionKey, v)
 }
+
+func (l *ldb) v6(ctx context.Context) error {


How long does this upgrade take?

Store TxLoc (offset + length within raw block) in the t entry value instead of nil. This allows callers to jump directly to a tx's bytes in the raw block without scanning — O(1) instead of O(txs_in_block). BlockHashByTxId now returns (*chainhash.Hash, wire.TxLoc, error). All callers updated. No separate method needed — callers that only need the hash use bh, _, err := BlockHashByTxId(...). processTxs calls block.TxLoc() and stores the location via NewTxMappingWithLoc. Errors from TxLoc() are logged at Errorf and the indexer falls back to nil values (legacy format). BlockTxUpdate uses stack-allocated reusable buffers instead of slicing loop variables. The previous code sliced the range variable and passed the slice to leveldb.Batch.Put. appendRec copies immediately, but the interaction between range variable reuse, map deletion, and GC is not guaranteed safe. Stack buffers are zero-alloc and independent per iteration. DB version 5 -> 6. Upgrade path wipes the transactions index for rebuild with TxLoc values. The index is fully derived from block data. Ref: #1050

marcopeereboom requested a review from a team as a code owner May 29, 2026 07:10

marcopeereboom force-pushed the marco/lazy-block branch from 9f1a394 to 99d768f Compare May 29, 2026 07:15

marcopeereboom force-pushed the marco/tx-offset branch from 461b684 to 4a29640 Compare May 29, 2026 07:15

This was referenced May 29, 2026

tbc: add ordinal indexer #1024

Closed

tbc: add ordinal indexer #1053

Open

joshuasing reviewed May 29, 2026

View reviewed changes

marcopeereboom force-pushed the marco/lazy-block branch from 99d768f to e103e96 Compare May 30, 2026 18:21

marcopeereboom force-pushed the marco/tx-offset branch from 4a29640 to 1a04a83 Compare May 30, 2026 18:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tbc: store tx byte location in tx index, bump DB to v6#1052

tbc: store tx byte location in tx index, bump DB to v6#1052
marcopeereboom wants to merge 2 commits into
marco/lazy-blockfrom
marco/tx-offset

marcopeereboom commented May 29, 2026

Uh oh!

joshuasing May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

marcopeereboom commented May 29, 2026

Summary

Problem

Solution

Testing

Impact

Related

Files changed

Uh oh!

joshuasing May 29, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants