fix: do not send zero-length non-FIN stream frames#2492
Conversation
| ack_eliciting = true; | ||
| in_flight = true; | ||
| has_data = true; | ||
| } |
There was a problem hiding this comment.
It would be useful to see if this push_frame_to_pkt! can ever fail. It would indicate that we generated a frame that is larger than left despite taking it into account when picking the frame length.
There was a problem hiding this comment.
FWIW, I have local changes (in preparation for in-band-traceroute stuff) where I'm refactoring that part of send_single into a helper struct. I'd prefer to add it there to minimize merge conflicts :-)
There was a problem hiding this comment.
Comment was a note for future me.
send_single could use a lot of refactor. The lost frame detection should be moved elsewhere too.
| // bytes), but we'd need a different iterator, since | ||
| // peek_flushable() returns the same stream on every iteration. | ||
| let max_len = match left.checked_sub(hdr_len + 1) { | ||
| Some(v) => v + 1, |
There was a problem hiding this comment.
Technically the +1 is only needed in cases where the stream has at least 1 byte to send. The case of stream FIN with no data could be transmitted with v == 0.
There was a problem hiding this comment.
+1 I think we need a little more nuance here
| self.streams.remove_flushable(&priority_key); | ||
|
|
||
| continue; | ||
| break; |
There was a problem hiding this comment.
An implication of break instead of continue is that other streams with lower stream id or offset could have written a tiny frame but now they won't.
I need to look into the implications of the remove_flushable
The check for MAX_STREAM_OVERHEAD in the original code made this branch unlikely to be reached and we could run into problems when we do hit it.
I think with the lower MIN this branch will be easier to test correct. This is a very good thing.
There was a problem hiding this comment.
FYI, remove_flushable() means we don't try to emit from this stream again until it's re-inserted into flushable. That only happens when the app writes more data to the stream. If the app already wrote a FIN, the stream will be stranded forever.
I was actually wondering if should remove the MIN_STREAM_OVERHEAD guard altogether and just rely on this check here. As soon as a stream has offset>64 we're past the guard anyways, so I don't think it really does much and just adds clutter / complexity.
There was a problem hiding this comment.
I have no objections to removal of the MIN_STREAM_OVERHEAD guard.
There was a problem hiding this comment.
If we want to continue instead of break, we'd just need to add a different iterator for flushable. I think many of the other streams.<adjective> collection already to.
But with the old MAX_STREAM_OVERHEAD guard, we also skipped many streams that would have fit.
There was a problem hiding this comment.
I'm fine with the break.
There was a problem hiding this comment.
I am confused by the remove_flushable call. Wouldn't this cause a stream hang?
There was a problem hiding this comment.
Oh I see, you removed the call to remove_flushable. Did we just discover another bug here? :)
There was a problem hiding this comment.
I think this change in behaviour needs some more consideration, the break risks a high-priority stream blocking forward progress on others and introducing tail latency. Worst case, a certain combo could completely stall out all other streams.
I think I'd be more comfortable with having a min guard (to prevent looping through the whole set when we really tand no hope of sending anything) and trying to walk the entire flushable queue. We could avoid needing a new fluhable iterator type if we store the removed flushable values and reinsert them after we're done sending. But if its easy enough to add a new iterator that would be ok too.
There was a problem hiding this comment.
I think this is still a net improvement. The vast majority of cases / streams would have never hit this continue at all (only streams with hdr_len > 12 would have, so essentially only a stream_off of 1GB+ would have). The MAX_STREAM_OVERHEAD guard would have skipped even attempting to fit a frame. Now we at least try to see how much space the header would actually need. In many cases (hdr_len < 12), we can actually send a frame now where we previously couldn't. The break here we mimics the behavior of the old MAX_STREAM_OVERHEAD: if we can't fit the frame, we simply skip the attempt of sending a STREAM until cwnd_available increases again.
The impact on lower priority streams should be minimal, I think. In the best case, another stream could have fit
a max 14bytes of payload into a frame before it also has to wait for more available cwnd.
That said, adding the iterator is pretty straightforward and it's certainly a further improvement.
There was a problem hiding this comment.
yes the new approach does benefit from supporting smaller stream frames.
I think the old bug had an upside. If you have a huge STREAM frame to send and left was between 12 and 20 bytes so you couldn't actually send, the stream would be dropped and since checked_sub didn't update left, the continue an then emit one frame in the same round, and never be worried about the huge frame again 👿
Adding an iterator lets us have the best of both worlds
| ) { | ||
| let mut config = test_utils::Pipe::default_config(cc_algorithm_name).unwrap(); | ||
| // stream_id 16384 requires a 4-byte varint. | ||
| let stream_id = 16384_u64; |
There was a problem hiding this comment.
Consider parametrizing this test so it covers stream ids that can be encoded in [1, 2, 4, 8] bytes. Similarly, would it be at all possible to cover offsets that have varint encoding in [1, 2, 4, 8] bytes
There was a problem hiding this comment.
I thought about changing the offset, but I think that requires more code in the test that it's worth. We'd need to first write to the stream, and then drain all the data out it again and then do the test.
1c5eb52 to
5278540
Compare
| self.emit_off = cmp::max(self.emit_off, next_off); | ||
|
|
||
| Ok((out.len() - out_len, fin)) | ||
| let written = out.len() - out_len; |
There was a problem hiding this comment.
what does this change do?
There was a problem hiding this comment.
Looks like a functional no-op
| self.emit_off = cmp::max(self.emit_off, next_off); | ||
|
|
||
| Ok((out.len() - out_len, fin)) | ||
| let written = out.len() - out_len; |
There was a problem hiding this comment.
Looks like a functional no-op
| // this commit in favour of MIN_STREAM_OVERHEAD. See | ||
| // https://github.com/cloudflare/quiche/pull/2453 for more | ||
| // comprehensive app-limited changes/fixes. | ||
| if !has_data && !dgram_emitted && cwnd_available > 12 { |
There was a problem hiding this comment.
I think we discovered that a more correct number is 20. But I'm fine keeping it as it for consistency.
| !dgram_emitted && | ||
| cwnd_available > frame::MAX_STREAM_OVERHEAD | ||
| { | ||
| // TODO: 12 is the old MAX_STREAM_OVERHEAD value, which was removed in |
There was a problem hiding this comment.
Could we add a local constant LEGACY_APP_LIMITED_BYTE_THRESHOLD for this magic number?
While these frames are valid per RFC 9000, it is pointless to send them (and they triggered another bug on the receive side). The MAX_STREAM_OVERHEAD constant we used to guard the path that writes STREAM frames was too low; STREAM frame headers can be up to 19 bytes. However, using the maximum overhead is a bit iffy as well, since it would prevent us from writing the more common, smaller frames. So instead, we guard on MIN_STREAM_OVERHEAD and add a secondary check to ensure the remaining capacity is enough to fit at least one byte of stream payload. While doing this, I also noticed that we were removing the stream from the flushable queue when there was not enough space for its header. That could strand the stream permanently and may also account for connections running out of flow-control send credit. Fixed that as well. The MAX_STREAM_OVERHEAD constant was also used in the recovery on_app_limited() check, so I had to tweak that logic too.
5278540 to
ec4d166
Compare
|
addressed all the comments |
While these frames are valid per RFC 9000, it is pointless to send them (and they triggered another bug on the receive side).
The MAX_STREAM_OVERHEAD constant we used to guard the path that writes STREAM frames was too low; STREAM frame headers can be up to 19 bytes. However, using the maximum overhead is a bit iffy as well, since it would prevent us from writing the more common, smaller frames. So instead, we guard on MIN_STREAM_OVERHEAD and add a secondary check to ensure the remaining capacity is enough to fit at least one byte of stream payload.
While doing this, I also noticed that we were removing the stream from the flushable queue when there was not enough space for its header. That could strand the stream permanently and may also account for connections running out of flow-control send credit. Fixed that as well.
The MAX_STREAM_OVERHEAD constant was also used in the recovery on_app_limited() check, so I had to tweak that logic too.