Skip to content

perf: prefetch next ListObjects page while consumer iterates current page#236

Open
jiuker wants to merge 7 commits into
minio:mainfrom
jiuker:fix-issue-218
Open

perf: prefetch next ListObjects page while consumer iterates current page#236
jiuker wants to merge 7 commits into
minio:mainfrom
jiuker:fix-issue-218

Conversation

@jiuker

@jiuker jiuker commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

fix #218

## Problem

\`ListObjectsResult::operator++()\` fetches the next page only after the caller has consumed the current page. See \`src/client.cc:89-127\` (Populate) and \`include/miniocpp/client.h:54-66\` (iterator). For a 1M-object bucket at 1000/page, that's 1000 sequential round-trips with the consumer blocked between each.

## Suggested approach

When the response arrives with \`IsTruncated=true\`, kick off the next page fetch on a background thread / via the async surface (#217). \`ListObjectsResult\` holds the prefetched response and serves it without a round-trip when the iterator crosses the page boundary.

Internal-only change, no API impact for sync callers. Memory: one extra page in flight.

## Impact

**Medium.** Roughly halves wall-clock time on large bucket listings when the per-page network round-trip is comparable to the per-page consumer work.

## Roadmap

Tier 2 item from the C++ modernization audit. Related: #217 (async API surface).

Summary by CodeRabbit

  • New Features
    • Improved object listing responsiveness with background page prefetching.
    • Prefetch now drives subsequent pages when results are truncated.
  • Bug Fixes
    • More consistent pagination marker handling across listing modes and error cases.
    • Enhanced failure propagation, including when bucket/region resolution fails.
  • Breaking Changes
    • Listing response objects are now non-copyable (move supported).
    • Updated ListObjectsResult iterator validity behavior (truthiness and increment timing).

…page

perf: prefetch next ListObjects page while consumer iterates current page
@coderabbitai

coderabbitai Bot commented Jun 29, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

@jiuker, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 18 minutes. Learn how PR review limits work.

To continue reviewing without waiting, enable usage-based billing in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: daa24314-9e47-4469-a884-de5d6891c59a

📥 Commits

Reviewing files that changed from the base of the PR and between 353a2aa and 843055a.

📒 Files selected for processing (1)
  • src/client.cc
📝 Walkthrough

Walkthrough

ListObjectsResult now prefetches the next page asynchronously. Construction starts a background fetch, iteration consumes the prefetched response, and truncated pages schedule the next fetch. ListObjectsResponse is made move-only to support the stored response flow.

Changes

Async prefetch for ListObjects pagination

Layer / File(s) Summary
Response and result class contracts
include/miniocpp/response.h, include/miniocpp/client.h
ListObjectsResponse deletes copy operations and defaults move operations. ListObjectsResult adds shared response storage, prefetch state, private helper declarations, prefetch-aware truthiness, and adjusted increment behavior.
Async prefetch and pagination flow
src/client.cc
Constructors create the shared response and start prefetching. StartPrefetch() resolves region, normalizes delimiter settings, selects the listing API, and returns an error response on failures. Populate() consumes the future, updates iteration state, and schedules the next page when truncated.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐇 I hopped through pages, one by one,
Now the next one’s fetched before I’m done.
A little future glows beneath the pine,
And listing hops along just fine.
When one page ends, the next appears to play—
a brisk burrow path through listing day.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly matches the main change: prefetching the next ListObjects page during iteration for performance.
Linked Issues check ✅ Passed The implementation prefetches truncated pages asynchronously and consumes them at page boundaries without changing sync caller behavior.
Out of Scope Changes check ✅ Passed The changes stay focused on ListObjects iteration prefetching and the response storage updates needed to support it.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@include/miniocpp/client.h`:
- Around line 60-62: The boolean conversion in ListObjectsResult is missing
const qualification, which breaks use on const instances. Update the explicit
operator bool() in the relevant class to be const again, or add a const overload
that preserves the same semantics, while keeping the existing behavior around
Populate() and itr_ intact.
- Line 43: The ListObjectsResult copying logic is unsafe because itr_ remains
tied to the source resp_.contents and the shared one-shot future can be consumed
incorrectly by postfix or user copies. Fix this in ListObjectsResult by either
making the type move-only and adjusting postfix operator++/other copy paths
accordingly, or implement a real copy that rebases the iterator onto the new
resp_.contents and uses a shared_future-style prefetch mechanism so copies do
not lose the prefetch or dereference the wrong container.

In `@include/miniocpp/response.h`:
- Around line 301-304: Avoid default-copying ListObjectsResponse because its
view members can stay bound to another instance’s owned_ storage and dangle
after ListObjectsResult::operator++(int) returns by value. Update
ListObjectsResponse so copy construction and copy assignment are either deleted
or explicitly rebuild owned_ and rebind all dependent views (including
Item::etag and any URL-decoded fields) from the copied data; keep move
operations if they remain safe.

In `@src/client.cc`:
- Around line 197-209: The comments around the prefetch handling in client.cc
are redundant because they merely restate the code in the surrounding block.
Remove the obvious narration comments in the section that uses prefetch_future_
and initializes itr_, keeping the logic in the existing methods unchanged and
leaving only comments that add non-obvious context.
- Around line 171-188: The region resolved in the object listing flow is only
stored in a local variable and never passed along, so the
ListObjectsV1/ListObjectsV2/ListObjectVersions requests keep using an empty or
stale region. Update the request preparation in this branch to assign the
resolved value to next_args.region before constructing the ListObjects*Args
objects, using the existing GetRegion and next_args flow so all subsequent calls
carry the correct region.
- Around line 167-193: The async prefetch lambda in client.cc is capturing the
owning object via this, which can become dangling if the result outlives the
origin. Update the std::async lambda inside the prefetch setup to capture a
Client* (or equivalent stable client handle) instead of this, and use that
pointer for GetRegion, ListObjectVersions, ListObjectsV1, and ListObjectsV2
calls. Keep the ListObjectsResult prefetch logic unchanged otherwise, but ensure
the lambda no longer depends on the lifetime of the parent result object.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: fb5290d1-4478-4ff3-b6da-93f5e324036a

📥 Commits

Reviewing files that changed from the base of the PR and between 1bb7b2c and 06eac50.

📒 Files selected for processing (3)
  • include/miniocpp/client.h
  • include/miniocpp/response.h
  • src/client.cc

Comment thread include/miniocpp/client.h Outdated
Comment thread include/miniocpp/client.h Outdated
Comment thread include/miniocpp/response.h Outdated
Comment thread src/client.cc Outdated
Comment thread src/client.cc Outdated
Comment thread src/client.cc Outdated
apply suggestion

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/client.cc (2)

171-192: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Preserve the original GetRegion failure.

The failure path drops the actual GetRegionResponse and replaces it with a generic error, so callers lose the original status/message.

Proposed fix
-            std::string region;
-            if (GetRegionResponse resp =
-                    client->GetRegion(next_args.bucket, next_args.region)) {
-              next_args.region = resp.region;
+            GetRegionResponse region_resp =
+                client->GetRegion(next_args.bucket, next_args.region);
+            if (region_resp) {
+              next_args.region = region_resp.region;
               if (next_args.recursive) {
                 next_args.delimiter = "";
               } else if (next_args.delimiter.empty()) {
                 next_args.delimiter = "/";
               }
@@
                 return client->ListObjectsV2(ListObjectsV2Args(next_args));
               }
             }
-            return ListObjectsResponse(
-                error::make<ListObjectsResponse>("unable to get region"));
+            return ListObjectsResponse(region_resp);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/client.cc` around lines 171 - 192, The region lookup failure in the
client listing flow is being collapsed into a generic error, which discards the
original GetRegionResponse details. Update the GetRegion handling in client.cc
so the failure path preserves and returns the original response/status from
client->GetRegion(next_args.bucket, next_args.region) instead of constructing a
new generic ListObjectsResponse error. Keep the existing success path intact for
ListObjectVersions, ListObjectsV1, and ListObjectsV2.

167-200: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Handle async failures in ListObjectsResult Launch failures from std::async(std::launch::async, ...) throw std::system_error at the call site, and exceptions from the prefetch task are rethrown by future::get() in Populate(). These bypass the ListObjectsResponse error path; convert both into response errors instead.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/client.cc` around lines 167 - 200, Handle async failures in
ListObjectsResult by converting both std::async launch errors and task
exceptions into ListObjectsResponse errors instead of letting them escape. In
the prefetch setup around prefetch_future_ creation, wrap the std::async call
and the lambda body in error handling that returns a failing ListObjectsResponse
via error::make rather than throwing std::system_error or other exceptions. In
ListObjectsResult::Populate(), catch exceptions from future::get() and convert
them to resp_ error responses so all failures flow through the existing response
error path.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@include/miniocpp/client.h`:
- Around line 54-57: Keep ListObjectsResult source-compatible for synchronous
callers by avoiding the API-breaking removal of copy semantics and the postfix
increment contract. Update ListObjectsResult so callers can still copy or
snapshot results safely, and preserve operator++(int) behavior expected by
existing code that uses result++. If the change must remain breaking, explicitly
document it as an API change and adjust the affected ListObjectsResult methods
accordingly.

---

Outside diff comments:
In `@src/client.cc`:
- Around line 171-192: The region lookup failure in the client listing flow is
being collapsed into a generic error, which discards the original
GetRegionResponse details. Update the GetRegion handling in client.cc so the
failure path preserves and returns the original response/status from
client->GetRegion(next_args.bucket, next_args.region) instead of constructing a
new generic ListObjectsResponse error. Keep the existing success path intact for
ListObjectVersions, ListObjectsV1, and ListObjectsV2.
- Around line 167-200: Handle async failures in ListObjectsResult by converting
both std::async launch errors and task exceptions into ListObjectsResponse
errors instead of letting them escape. In the prefetch setup around
prefetch_future_ creation, wrap the std::async call and the lambda body in error
handling that returns a failing ListObjectsResponse via error::make rather than
throwing std::system_error or other exceptions. In
ListObjectsResult::Populate(), catch exceptions from future::get() and convert
them to resp_ error responses so all failures flow through the existing response
error path.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 8a90d3eb-c294-4320-add0-ea6dab3aa9a9

📥 Commits

Reviewing files that changed from the base of the PR and between 06eac50 and 53ca784.

📒 Files selected for processing (3)
  • include/miniocpp/client.h
  • include/miniocpp/response.h
  • src/client.cc

Comment thread include/miniocpp/client.h Outdated
jiuker added 2 commits June 29, 2026 10:22
apply suggestoin
apply suggestion

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
include/miniocpp/client.h (1)

43-43: 🎯 Functional Correctness | 🟠 Major | 🏗️ Heavy lift

Do not share a one-shot std::future across copyable results.

ListObjectsResult is copyable again, but copies and postfix operator++ still share the same std::future; the first copy that calls Populate() consumes it, leaving the other copy unable to fetch the prefetched page. Use a copy-safe shared state, such as a future of shared_ptr<ListObjectsResponse> or another mechanism where each copied result can materialize the same prefetched page.

#!/bin/bash
set -euo pipefail
sed -n '36,79p' include/miniocpp/client.h
sed -n '211,231p' src/client.cc

Also applies to: 54-55, 74-78

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@include/miniocpp/client.h` at line 43, ListObjectsResult is copyable again,
but prefetch_future_ still holds a one-shot future that gets consumed by the
first copy calling Populate(), so copied results and postfix operator++ can no
longer fetch the same prefetched page. Update the ListObjectsResult/Populate
flow in client.h and src/client.cc to use copy-safe shared state instead of
std::future<ListObjectsResponse>, such as a shared_ptr-backed prefetched
response or a future of shared_ptr<ListObjectsResponse>, so every copy can
materialize the same page.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/client.cc`:
- Around line 156-164: ListObjectsResult::UpdatePaginationArgs is using
include_versions to decide when to advance version-listing pagination, but
StartPrefetch routes version listing based on version_id_marker, so both paths
must use the same predicate. Update UpdatePaginationArgs to advance key/version
markers whenever the request is in version-listing mode, matching the logic in
StartPrefetch and the ListObjectVersions flow, and apply the same change
anywhere else the pagination update logic is duplicated.
- Around line 142-153: Initialize itr_ before calling StartPrefetch() in both
ListObjectsResult constructors so const operator bool() does not read an
uninitialized iterator. Set itr_ to resp_->contents.end() after resp_ is created
in each constructor, and keep the existing prefetch flow unchanged. Refer to
ListObjectsResult and StartPrefetch when updating the constructor initialization
path.

---

Duplicate comments:
In `@include/miniocpp/client.h`:
- Line 43: ListObjectsResult is copyable again, but prefetch_future_ still holds
a one-shot future that gets consumed by the first copy calling Populate(), so
copied results and postfix operator++ can no longer fetch the same prefetched
page. Update the ListObjectsResult/Populate flow in client.h and src/client.cc
to use copy-safe shared state instead of std::future<ListObjectsResponse>, such
as a shared_ptr-backed prefetched response or a future of
shared_ptr<ListObjectsResponse>, so every copy can materialize the same page.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 6c110e9d-d654-4bd1-9b9b-e27061e1cee1

📥 Commits

Reviewing files that changed from the base of the PR and between 53ca784 and 226ac52.

📒 Files selected for processing (2)
  • include/miniocpp/client.h
  • src/client.cc

Comment thread src/client.cc
Comment thread src/client.cc

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/client.cc (1)

142-155: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Populate the first page before returning from synchronous ListObjects.

These constructors now return with itr_ pinned to the empty placeholder response. That makes const ListObjectsResult result = client.ListObjects(args); if (result) report false even when the listing has objects, because the const bool overload cannot trigger Populate(). Start the prefetch, then consume the first page before returning; Populate() will still schedule the next-page prefetch when truncated.

Proposed fix
   resp_ = std::make_shared<ListObjectsResponse>();
   itr_ = resp_->contents.end();
   StartPrefetch();
+  Populate();
 }
@@
   resp_ = std::make_shared<ListObjectsResponse>();
   itr_ = resp_->contents.end();
   StartPrefetch();
+  Populate();
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/client.cc` around lines 142 - 155, The synchronous ListObjects path
leaves ListObjectsResult initialized with only the empty placeholder response,
so a const result can evaluate false even when objects exist. Update the
ListObjectsResult constructors in src/client.cc so that after StartPrefetch()
they also populate the first page before returning, using Populate() (or
equivalent first-page consumption) while preserving the next-page prefetch
behavior when the response is truncated. Keep the fix localized to the
ListObjectsResult construction flow and the bool/operator logic it feeds.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@src/client.cc`:
- Around line 142-155: The synchronous ListObjects path leaves ListObjectsResult
initialized with only the empty placeholder response, so a const result can
evaluate false even when objects exist. Update the ListObjectsResult
constructors in src/client.cc so that after StartPrefetch() they also populate
the first page before returning, using Populate() (or equivalent first-page
consumption) while preserving the next-page prefetch behavior when the response
is truncated. Keep the fix localized to the ListObjectsResult construction flow
and the bool/operator logic it feeds.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 031dc2df-57b8-4a0c-9baa-a10e14e5cbd3

📥 Commits

Reviewing files that changed from the base of the PR and between 226ac52 and 353a2aa.

📒 Files selected for processing (2)
  • include/miniocpp/client.h
  • src/client.cc

apply suggestion
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

perf: prefetch next ListObjects page while consumer iterates current page

1 participant