Skip to content

Add JMH benchmarks comparing read I/O strategies under memory pressure#16279

Open
neoremind wants to merge 9 commits into
apache:mainfrom
neoremind:16044_readio_pr
Open

Add JMH benchmarks comparing read I/O strategies under memory pressure#16279
neoremind wants to merge 9 commits into
apache:mainfrom
neoremind:16044_readio_pr

Conversation

@neoremind

@neoremind neoremind commented Jun 20, 2026

Copy link
Copy Markdown
Contributor

Adds JMH benchmarks to compare read I/O strategies in memory constrained scenario, related to #16044.

I/O strategies tested:

  • mmap no madvise
  • mmap + MADV_NORMAL + MADV_WILLNEED
  • mmap + MADV_RANDOM
  • mmap + MADV_RANDOM + MADV_WILLNEED
  • FFI pread(2) via Panama
  • FileChannel + DirectByteBuffer (simulates NIOFSDirectory)
  • FileChannel + HeapByteBuffer
  • O_DIRECT

Thread counts: 1, 4, 8, 16.

How to run

dd if=/dev/urandom of=/path/to/pread-bench-16G.dat bs=1M count=16384

java -jar lucene/benchmark-jmh/build/benchmarks/lucene-benchmark-jmh-11.0.0-SNAPSHOT.jar RandomReadIOBenchmark \
  -jvmArgs "--enable-native-access=ALL-UNNAMED -Xms2g -Xmx2g -Dbench.file=/path/to/pread-bench-16G.dat -Dbench.fileSizeMB=16384" \
  -p readSize=16384 -p readsPerOp=16

@github-actions github-actions Bot added this to the 10.6.0 milestone Jun 20, 2026
Comment on lines +281 to +282
MemorySegment slice = mmapSegmentNormal.asSlice(offsets[i], readSize);
int rc = (int) POSIX_MADVISE.invokeExact(slice, (long) readSize, MADV_WILLNEED);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

madvise needs a page-aligned start address, so passing the raw random offset here makes it return EINVAL and do nothing — and since rc is discarded, it fails silently. On a c6id.4xlarge strace shows ~5117/5181 of these calls returning -1 EINVAL, so the prefetch never actually runs and the prefetch rows end up identical to plain mmap. The real Directory avoids this because MemorySegmentIndexInput#advise rounds the start down to the page first.

Suggested fix (mirrors what the Directory does):

Suggested change
MemorySegment slice = mmapSegmentNormal.asSlice(offsets[i], readSize);
int rc = (int) POSIX_MADVISE.invokeExact(slice, (long) readSize, MADV_WILLNEED);
// madvise needs a page-aligned start address, otherwise it returns EINVAL and is a no-op.
long offsetInPage = (mmapSegmentNormal.address() + offsets[i]) % ALIGNMENT;
long aoff = offsets[i] - offsetInPage;
long alen = readSize + offsetInPage;
MemorySegment slice = mmapSegmentNormal.asSlice(aoff, alen);
int rc = (int) POSIX_MADVISE.invokeExact(slice, alen, MADV_WILLNEED);
assert rc == 0 : "posix_madvise failed: " + rc;

Same change is needed in doMmapMadvRandomBatchedPrefetch (against mmapSegmentMadvRandom). With this, single-threaded mmap+prefetch goes from ~0.15 → ~4.2 ops/ms cold (≈7× pread at T01, ~device saturation). Might be worth asserting on rc at the other madvise call sites too so this can't silently regress again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants