Skip to content

BBR: Check for app-limited when entering a new iteration of the send loop#2453

Open
antoniovicente wants to merge 3 commits into
masterfrom
antonio/app_limited_v3
Open

BBR: Check for app-limited when entering a new iteration of the send loop#2453
antoniovicente wants to merge 3 commits into
masterfrom
antonio/app_limited_v3

Conversation

@antoniovicente
Copy link
Copy Markdown
Contributor

The new logic matches the app-limited detection algorithm described in
the BBR RFC draft, which involves checking for the previous state of the
connection before timeout and ack processing.

This change detects the end of the current iteration of the send loop by
looking at the result of send_single and notifying the congestion controller
when send stopped early with Err::Done due to either there being no data
left to send or cwnd being fully utilized.

By automatically detecting the end of the current iteration we can avoid
making changes to the quiche Connection's public API.

This change simplifies the event loop and removes the need to bias the
select! to avoid starvation by serializing all the event loop actions
and clarifying the start and end of the work loop iteration.
The new logic matches the app-limited detection algorithm described in
the BBR RFC draft, which involves checking for the previous state of the
connection before timeout and ack processing.
Comment thread quiche/src/lib.rs
Comment on lines +2850 to +2851
/// Returns true if at least 1 stream has headers or body data to
/// write, or there are items in the DATAGRAM send queue.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: QUIC treats stream bytes as opaque data. It doesn't matter what the bytes represent at the transport layer.

Suggested change
/// Returns true if at least 1 stream has headers or body data to
/// write, or there are items in the DATAGRAM send queue.
/// Returns true if at least 1 stream has data to
/// write, or there are items in the DATAGRAM send queue.

// Allow the BBR implementation to compute if the congestion
// controller is app-limited, before processing timeouts, ACKs or
// generating packets to send.
fn bbr_check_if_app_limited(&mut self, now: &Instant);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: its not 100% clear to me what this does at a high level. The &mut indicates it has a side effect, is that an important detail? I.e. should you say something like "if the check determines it is app-limited then the controller marks itself as app-limited" ?

Comment thread quiche/src/tests.rs
fn app_limited_true(
#[values("cubic", "bbr2_gcongestion")] cc_algorithm_name: &str,
#[values(true, false)] discard: bool,
#[values(false, true)] enable_bbr_app_limited_fix: bool,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: for consistency with the prior line the order can be flipped. I don't really care much though

Suggested change
#[values(false, true)] enable_bbr_app_limited_fix: bool,
#[values(true, false)] enable_bbr_app_limited_fix: bool,

// returned Err::Done; this indicates that it is time to yield
// because either there is no cwnd remaining or there is no data
// left to send.
fn send_stopped_early(&mut self, has_flushable_data: bool);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: perhaps worth documenting what has_flushable_data represents here, its just flushable application data right?

Copy link
Copy Markdown
Contributor

@LPardue LPardue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM modulo some nits.

I didn't go into all the BBR low-level details though

// exit the send loop earlier we should mark the connection in some way so
// the next call to bbr_check_if_app_limited does not mark the
// connection app-limited after yielding when C.pending_transmissions > 0.
if self.cwnd() > self.bytes_in_flight.get() + frame::MAX_STREAM_OVERHEAD &&
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to actually check for self.cwnd() > bytes_in_flight + max_udp_payload_len
For datagrams the under-utilization can be up to a full packet. For streams, you still need to account at least for packet_header + MAX_STREAM_OVERHEAD

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're getting into the differences between the BBR RFC and the older/expired delivery rate estimation draft.

https://www.ietf.org/archive/id/draft-cheng-iccrg-delivery-rate-estimation-01.html#section-3.4-3
vs
https://www.ietf.org/archive/id/draft-ietf-ccwg-bbr-05.html#section-4.1.2.4-3

I think we do need the precision in this case: calling the connection not congestion limited when remaining cwnd is less than 1 packet can cause problems when the cwnd is close to the 2 packet minimum; sending significantly less than the full cwnd won't result in a measurable BW increase and the connection will exit due to bandwidth plateau in case where it shouldn't.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I'm not a BBR expert by any means, so take with a grain of salt).
I don't think we really capture NoUnsentData, because send_single() returning Error::Done could be because there is no unsent data, or there is something we wanted to send, but we are out of cwnd. I think we need to distinguish these two cases to get a reliable app_limited signal (*).

I think we do need the precision in this case: calling the connection not congestion limited when remaining cwnd is less than 1 packet can cause problems when the cwnd is close to the 2 packet minimum;

The check I'm proposing is "less than 1 packet remaining".
I'm wondering if the 2 packet minimum will actually work well for MASQUE, since might only be able to send full-sized packets or nothing. Maybe we need a 3 or packet minimum.

(*) in quiche we also need to be careful in what order we call send(), recv(), and the function to write to / read from streams. If these are called in the "wrong" order the send buffer could be empty in every iteration.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There being a required ordering between "the function to write to / read from streams" and send() is a problem.

And I think you're right in that there is an ordering requirement: recv, generate data, send

I think the current ordering is: recv, send, generate data!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we do recv, generate, send in work_loop (self.conn_stage.on_read(did_recv, qconn, ctx)?; generates data). But I haven't thought through all possible interleavings that are possible with different branches of the tokio::select call finishing and/or timeout handling....

IMHO the fact that we can process a ton of packets (or writes) at once without interleaving network IO and app IO is also a potential concern.

// the next call to bbr_check_if_app_limited does not mark the
// connection app-limited after yielding when C.pending_transmissions > 0.
if self.cwnd() > self.bytes_in_flight.get() + frame::MAX_STREAM_OVERHEAD &&
!pacer_has_pending_transmissions(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to check the pacer here?

And is now the right time to use? Should it be the time when send_stopped_early() was last called?
Or rather, should send_stopped_early() be:

  fn send_stopped_early(&mut self, has_flushable_data: bool, now: Instant) {
        if !has_flushable_data && !pacer_has_pending_transmissions(next_release_time, now) {
            self.check_for_app_limited_next_iteration = true;
        }
    }

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This attempts to mimic the C.pending_transmissions == 0 check from https://www.ietf.org/archive/id/draft-ietf-ccwg-bbr-05.html#section-4.1.2.4-3

I think it tries to capture the next time when the connection is expected to be starved of data to send. The connection could end up app limited if the attempt to resume send is delayed for some reason, which could include the machine being busy doing sends or handling events for other active connections.

And yes, I think the comparison is meant to be against the current time, not the time when the previous packet was sent.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good. (Looks like I can start a comment thread, by I can't resolve one :-/)

}
}

fn pacer_has_pending_transmissions(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think you can replace this function with next_release_time.time(now).is_none().
(Or add a more descriptive function to ReleaseDecision, e.g., ReleaseDecision::can_release(now)`.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants