Skip to content

fix: do not send zero-length non-FIN stream frames#2492

Open
gregor-cf wants to merge 1 commit into
cloudflare:masterfrom
gregor-cf:gregor/dont-send-zero-length-stream-frames
Open

fix: do not send zero-length non-FIN stream frames#2492
gregor-cf wants to merge 1 commit into
cloudflare:masterfrom
gregor-cf:gregor/dont-send-zero-length-stream-frames

Conversation

@gregor-cf
Copy link
Copy Markdown
Contributor

While these frames are valid per RFC 9000, it is pointless to send them (and they triggered another bug on the receive side).

The MAX_STREAM_OVERHEAD constant we used to guard the path that writes STREAM frames was too low; STREAM frame headers can be up to 19 bytes. However, using the maximum overhead is a bit iffy as well, since it would prevent us from writing the more common, smaller frames. So instead, we guard on MIN_STREAM_OVERHEAD and add a secondary check to ensure the remaining capacity is enough to fit at least one byte of stream payload.

While doing this, I also noticed that we were removing the stream from the flushable queue when there was not enough space for its header. That could strand the stream permanently and may also account for connections running out of flow-control send credit. Fixed that as well.

The MAX_STREAM_OVERHEAD constant was also used in the recovery on_app_limited() check, so I had to tweak that logic too.

@gregor-cf gregor-cf requested a review from a team as a code owner May 21, 2026 21:22
Comment thread quiche/src/lib.rs
Comment thread quiche/src/lib.rs
ack_eliciting = true;
in_flight = true;
has_data = true;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be useful to see if this push_frame_to_pkt! can ever fail. It would indicate that we generated a frame that is larger than left despite taking it into account when picking the frame length.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I have local changes (in preparation for in-band-traceroute stuff) where I'm refactoring that part of send_single into a helper struct. I'd prefer to add it there to minimize merge conflicts :-)

Copy link
Copy Markdown
Contributor

@antoniovicente antoniovicente May 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment was a note for future me.

send_single could use a lot of refactor. The lost frame detection should be moved elsewhere too.

Comment thread quiche/src/lib.rs
// bytes), but we'd need a different iterator, since
// peek_flushable() returns the same stream on every iteration.
let max_len = match left.checked_sub(hdr_len + 1) {
Some(v) => v + 1,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically the +1 is only needed in cases where the stream has at least 1 byte to send. The case of stream FIN with no data could be transmitted with v == 0.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 I think we need a little more nuance here

Comment thread quiche/src/lib.rs
self.streams.remove_flushable(&priority_key);

continue;
break;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An implication of break instead of continue is that other streams with lower stream id or offset could have written a tiny frame but now they won't.

I need to look into the implications of the remove_flushable

The check for MAX_STREAM_OVERHEAD in the original code made this branch unlikely to be reached and we could run into problems when we do hit it.

I think with the lower MIN this branch will be easier to test correct. This is a very good thing.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, remove_flushable() means we don't try to emit from this stream again until it's re-inserted into flushable. That only happens when the app writes more data to the stream. If the app already wrote a FIN, the stream will be stranded forever.

I was actually wondering if should remove the MIN_STREAM_OVERHEAD guard altogether and just rely on this check here. As soon as a stream has offset>64 we're past the guard anyways, so I don't think it really does much and just adds clutter / complexity.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no objections to removal of the MIN_STREAM_OVERHEAD guard.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to continue instead of break, we'd just need to add a different iterator for flushable. I think many of the other streams.<adjective> collection already to.
But with the old MAX_STREAM_OVERHEAD guard, we also skipped many streams that would have fit.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with the break.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused by the remove_flushable call. Wouldn't this cause a stream hang?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see, you removed the call to remove_flushable. Did we just discover another bug here? :)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this change in behaviour needs some more consideration, the break risks a high-priority stream blocking forward progress on others and introducing tail latency. Worst case, a certain combo could completely stall out all other streams.

I think I'd be more comfortable with having a min guard (to prevent looping through the whole set when we really tand no hope of sending anything) and trying to walk the entire flushable queue. We could avoid needing a new fluhable iterator type if we store the removed flushable values and reinsert them after we're done sending. But if its easy enough to add a new iterator that would be ok too.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is still a net improvement. The vast majority of cases / streams would have never hit this continue at all (only streams with hdr_len > 12 would have, so essentially only a stream_off of 1GB+ would have). The MAX_STREAM_OVERHEAD guard would have skipped even attempting to fit a frame. Now we at least try to see how much space the header would actually need. In many cases (hdr_len < 12), we can actually send a frame now where we previously couldn't. The break here we mimics the behavior of the old MAX_STREAM_OVERHEAD: if we can't fit the frame, we simply skip the attempt of sending a STREAM until cwnd_available increases again.

The impact on lower priority streams should be minimal, I think. In the best case, another stream could have fit
a max 14bytes of payload into a frame before it also has to wait for more available cwnd.

That said, adding the iterator is pretty straightforward and it's certainly a further improvement.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes the new approach does benefit from supporting smaller stream frames.

I think the old bug had an upside. If you have a huge STREAM frame to send and left was between 12 and 20 bytes so you couldn't actually send, the stream would be dropped and since checked_sub didn't update left, the continue an then emit one frame in the same round, and never be worried about the huge frame again 👿

Adding an iterator lets us have the best of both worlds

Comment thread quiche/src/lib.rs Outdated
Comment thread quiche/src/lib.rs Outdated
Comment thread quiche/src/tests.rs Outdated
) {
let mut config = test_utils::Pipe::default_config(cc_algorithm_name).unwrap();
// stream_id 16384 requires a 4-byte varint.
let stream_id = 16384_u64;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider parametrizing this test so it covers stream ids that can be encoded in [1, 2, 4, 8] bytes. Similarly, would it be at all possible to cover offsets that have varint encoding in [1, 2, 4, 8] bytes

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about changing the offset, but I think that requires more code in the test that it's worth. We'd need to first write to the stream, and then drain all the data out it again and then do the test.

@gregor-cf gregor-cf force-pushed the gregor/dont-send-zero-length-stream-frames branch from 1c5eb52 to 5278540 Compare May 21, 2026 22:05
self.emit_off = cmp::max(self.emit_off, next_off);

Ok((out.len() - out_len, fin))
let written = out.len() - out_len;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this change do?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a functional no-op

self.emit_off = cmp::max(self.emit_off, next_off);

Ok((out.len() - out_len, fin))
let written = out.len() - out_len;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a functional no-op

Comment thread quiche/src/frame.rs
Comment thread quiche/src/lib.rs
// this commit in favour of MIN_STREAM_OVERHEAD. See
// https://github.com/cloudflare/quiche/pull/2453 for more
// comprehensive app-limited changes/fixes.
if !has_data && !dgram_emitted && cwnd_available > 12 {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we discovered that a more correct number is 20. But I'm fine keeping it as it for consistency.

Comment thread quiche/src/lib.rs Outdated
!dgram_emitted &&
cwnd_available > frame::MAX_STREAM_OVERHEAD
{
// TODO: 12 is the old MAX_STREAM_OVERHEAD value, which was removed in
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a local constant LEGACY_APP_LIMITED_BYTE_THRESHOLD for this magic number?

While these frames are valid per RFC 9000, it is pointless to send them
(and they triggered another bug on the receive side).

The MAX_STREAM_OVERHEAD constant we used to guard the path that writes
STREAM frames was too low; STREAM frame headers can be up to 19 bytes.
However, using the maximum overhead is a bit iffy as well, since it would
prevent us from writing the more common, smaller frames. So instead, we
guard on MIN_STREAM_OVERHEAD and add a secondary check to ensure the
remaining capacity is enough to fit at least one byte of stream payload.

While doing this, I also noticed that we were removing the stream from the
flushable queue when there was not enough space for its header. That could
strand the stream permanently and may also account for connections running
out of flow-control send credit. Fixed that as well.

The MAX_STREAM_OVERHEAD constant was also used in the recovery
on_app_limited() check, so I had to tweak that logic too.
@gregor-cf gregor-cf force-pushed the gregor/dont-send-zero-length-stream-frames branch from 5278540 to ec4d166 Compare May 21, 2026 23:09
@gregor-cf
Copy link
Copy Markdown
Contributor Author

addressed all the comments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants