[WIP] feat: add shared io_uring context pool support#79
Conversation
| ASSERT_TRUE(IOContextPool::InitGlobal(cfg)); | ||
|
|
||
| auto pool = IOContextPool::GetGlobal(); | ||
| ASSERT_NE(pool, nullptr); |
There was a problem hiding this comment.
Tests share the global singleton without any reset mechanism, making results depend on GTest execution order. A ResetGlobalForTest() helper or fork()-based isolation (like the fallback test) is needed to prevent flaky failures.
| static std::shared_ptr<AioContextPool> | ||
| GetGlobalAioPool(); | ||
|
|
||
| static bool |
There was a problem hiding this comment.
Wait predicate only checks ctx_q_.size() and never checks stop_. Threads blocked here will deadlock on destruction when the queue is empty because notify_all() cannot break the wait.
| return true; | ||
| } | ||
|
|
||
| if (global_uring_pool_size != num_ctx || global_uring_max_entries != max_entries) { |
There was a problem hiding this comment.
Same problem as AioContextPool: config mismatch is logged but the function still returns true, silently allowing misconfiguration.
| static void | ||
| ResetGlobalForTest(); | ||
|
|
||
| ~AioContextPool() { |
There was a problem hiding this comment.
stop_ is a non-atomic bool written without holding ctx_mtx_, which is a data race. Also, io_destroy is called before notify_all, so woken threads may access already-destroyed contexts.
|
|
||
| #include "knowhere/io_context_pool.h" | ||
|
|
||
| #if defined(__cpp_lib_span) |
There was a problem hiding this comment.
IOReaderSpan resolves to different types depending on __cpp_lib_span; using it in public function signatures causes ABI incompatibility across C++17 and C++20 consumers.
| size_t num_ctx = default_pool_size; | ||
| #else | ||
| size_t num_ctx = 1; | ||
| #endif |
There was a problem hiding this comment.
num_ctx falls back to 1 when libaio is absent, so io_uring-only builds get a single ring and a severe concurrency bottleneck. Default should be reasonable regardless of libaio availability.
| } | ||
|
|
||
| bool | ||
| UringContextPool::InitGlobalUringPool(size_t num_ctx, size_t max_entries) { |
There was a problem hiding this comment.
Same structural bug as the AIO variant: InitGlobalUringPool(num_ctx, max_entries) internally delegates to IOContextPool::GetGlobal() which uses the default IOContextPoolConfig, so the caller-supplied num_ctx and max_entries are silently ignored. This is a confirmed bug in the new uring API introduced by this PR.
| } | ||
|
|
||
| bool | ||
| AioContextPool::InitGlobalAioPool(size_t num_ctx, size_t max_events) { |
There was a problem hiding this comment.
InitGlobalAioPool() calls GetGlobal() which uses a default IOContextPoolConfig (num_ctx=512) instead of forwarding the caller's global_aio_pool_size parameter. Any non-default num_ctx configuration is silently ignored — 100% regression for customized deployments. Compare with InitGlobalUringPool (uring_context_pool.cc L92-94) which correctly passes parameters. Test L167 will fail in !WITH_IO_URING environments.
Fix: Construct IOContextPoolConfig from the caller's parameters and pass it directly to InitGlobal(cfg).
55d7b49 to
df88d1e
Compare
|
|
||
| bool | ||
| AioContextPool::InitGlobalAioPool(size_t num_ctx, size_t max_events) { | ||
| AioContextPool::InitGlobalAioPoolWithValidation(size_t num_ctx, size_t max_events) { |
There was a problem hiding this comment.
The validation at line 20 only rejects max_events > default_max_events but allows max_events == 0 through. The function returns true (success) and sets global_aio_max_events = 0. Subsequent GetGlobalAioPoolDirect() constructs a pool where every io_setup(0, &ctx) call fails, producing an empty pool. Any pop() call will block indefinitely. Note: the unified entry point IOContextPool::InitGlobal at io_context_pool.cc:60 correctly rejects 0, so this only affects direct callers of the newly-exposed helper API declared in aio_context_pool.h:76.
df88d1e to
83170ae
Compare
| UringContextPool::GetGlobalUringPoolDirect() { | ||
| std::scoped_lock lk(global_uring_pool_mut); | ||
| if (global_uring_pool_size == 0) { | ||
| global_uring_pool_size = 1; |
There was a problem hiding this comment.
The legacy GetGlobalUringPoolDirect() constructs UringContextPool with num_ctx=1 by default, while AIO pool defaults to 512 contexts. Under direct legacy usage, this severely limits uring concurrency.
Fix: Align the default num_ctx with AIO pool defaults or use a reasonable default like default_pool_size.
Use the configured IO pool defaults for direct uring initialization and count interrupted syscalls toward the retry limit so sustained signals cannot loop forever. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bound interrupted completion waits and reset checked-out rings when submit failures leave prepared io_uring entries behind. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Invalidate dirty rings on failed completion cleanup and use fmt-style log placeholders so review-time diagnostics stay reliable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Keep already submitted completion-reader requests observable when submit cleanup resets a dirty ring, and ensure CI builds the io_uring path after dependency setup. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Avoid returning dirty io_uring handles after reader failures, make global IO pool initialization explicit, and publish the C++20 requirement exposed by public headers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ef3c9a2 to
0ad2d8e
Compare
| } | ||
| continue; | ||
| } | ||
| for (size_t i = result.completed; i < result.completed + static_cast<size_t>(completed); ++i) { |
There was a problem hiding this comment.
The retry counter in WaitAioBatch accumulates across iterations that made forward progress. With kNumRetries = 10 and max_events = 128, an AIO batch that the kernel returns piecewise will be cut off after 11 partial completions even when nothing is wrong. The io_uring sibling WaitUringBatch does not have this bug — it just increments result.completed per CQE with no retry counter on the progress path.
| break; | ||
| } | ||
| if (submitted == 0) { | ||
| if (++retry > kNumRetries) { |
There was a problem hiding this comment.
Same progress-penalty bug as WaitAioBatch: retry accumulates even when submitted > 0, causing premature abort of legitimate partial submissions.
| private: | ||
| std::shared_ptr<IOContextPool> pool_; | ||
| IOContextHandle handle_; | ||
| bool active_ = true; |
There was a problem hiding this comment.
The WaitAioBatch function takes const std::vector<struct iocb>& cbs but uses const_cast<struct iocb*>(&cbs[i]) to insert pointers into an unordered_set<struct iocb*>. This strips const-correctness and risks undefined behavior if the underlying data is truly const.
| } | ||
| throw std::runtime_error("io_uring_wait_cqe failed"); | ||
| } | ||
| if (cqe == nullptr) { |
There was a problem hiding this comment.
After io_uring_wait_cqe returns success (ret == 0), if cqe is nullptr the code does continue with no retry cap, no progress tracking, and no logging. If the kernel ever returns this combination, the worker thread hangs in an infinite loop. Compare ProcessAvailableCompletions at line 336 which correctly breaks on null cqe.
|
src/common/io_reader.cc:0 |
|
src/common/io_reader.cc:0 When |
| endif() | ||
|
|
||
| set(MILVUS_COMMON_WITH_IO_URING OFF) | ||
| find_path(URING_INCLUDE_DIR liburing.h) |
There was a problem hiding this comment.
The C++20 standard is linked as PUBLIC, meaning every target that depends on this library is forced to compile with C++20. This is driven by std::span usage in headers. Downstream projects that cannot adopt C++20 will fail to build. The requirement should be PRIVATE, and std::span-using headers should be isolated or guarded.
|
src/common/io_reader.cc:0 When the AIO backend is used with |
|
src/common/aio_context_pool.h:151 The legacy |
|
src/common/io_context_pool.cc:0 |
| @@ -0,0 +1,416 @@ | |||
| #include "knowhere/io_context_pool.h" | |||
There was a problem hiding this comment.
Enabling WITH_IO_URING causes GetGlobalAioPool() to return nullptr, breaking downstream consumers (knowhere/cardinal) that rely on this API. This is an uncoordinated contract change that will cause crashes in callers that don't null-check the return value.
| } | ||
|
|
||
| IOCompletionReader::RequestId | ||
| IOCompletionReader::Submit(IOCompletionReaderSpan<std::byte* const> buffers, size_t size, |
There was a problem hiding this comment.
IOReader::ReadAsync validates O_DIRECT alignment via ValidateDirectIoAlignment (io_reader.cc:607), but IOCompletionReader::Submit (lines 102-187) has no such check. With an O_DIRECT fd, unaligned buffers will produce a confusing kernel-side EINVAL instead of a clear library-level error. This creates inconsistent error semantics between the two reader classes in the same library.
| return false; | ||
| } | ||
|
|
||
| auto pool = UringContextPool::GetGlobalUringPoolDirect(); |
There was a problem hiding this comment.
IOContextPool has Push(IOContextHandle&) lvalue overloads that silently move the handle's ownership. A caller passing an lvalue won't realize ownership transferred. This is a classic implicit-move footgun that can lead to double-free or use-after-move bugs.
| } | ||
| #endif | ||
| #endif | ||
|
|
There was a problem hiding this comment.
The test uses a 100ms sleep as a synchronization mechanism. Under ASan or loaded CI environments, this timing window is unreliable and will produce flaky failures. The test needs proper synchronization (e.g., condition variable or future) instead of a sleep.
|
|
||
| auto owner = owner_; | ||
| if (owner == nullptr) { | ||
| LOG_WARN("IOContextHandle drops context without owner for backend {}", static_cast<int>(backend)); |
There was a problem hiding this comment.
AIO's ResetCheckedOut and uring's reset follow different patterns for returning the new context to the pool. This asymmetry makes the code harder to reason about and creates risk of resource leaks if a maintainer assumes symmetric behavior.
| } | ||
|
|
||
| const size_t first_batch = std::min(max_batch, buffers.size()); | ||
| std::vector<struct iocb> first_cbs; |
There was a problem hiding this comment.
After io_submit, the kernel holds pointers to iocb addresses inside first_cbs. The std::move(first_cbs) into AioReadState::first_cbs_ is safe because vector move is a pointer swap with the default allocator. However, this invariant is fragile — switching to SmallVector or a custom allocator would silently break it by relocating the buffer and corrupting kernel-visible addresses.
| return *this; | ||
| } | ||
|
|
||
| DrainOutstandingNoThrow(); |
There was a problem hiding this comment.
IOCompletionReader::operator=(&&) is not noexcept even though the move constructor is. This asymmetry prevents optimal container behavior (e.g., std::vector reallocation will copy instead of move).
| return io_uring_submit(ring); | ||
| } | ||
|
|
||
| size_t |
There was a problem hiding this comment.
PrepareUringBatch doesn't apply an explicit std::min(buffers.size() - start, max_events_per_ctx) cap like ReadAioAsync does. While io_uring_get_sqe returning null provides a natural cap via the SQ ring size, the asymmetry with the AIO path creates a readability and maintenance concern.
Introduce UringContextPool in milvus-common and gate it behind WITH_IO_URING with liburing detection so downstream knowhere/cardinal can share io_uring contexts safely.