Re: [PATCH net] ipv4, ipv6: Fix handling of transhdrlen in __ip{,6}_append_data()

From: David Howells
Date: Wed Sep 20 2023 - 04:36:49 EST


Willem de Bruijn <willemdebruijn.kernel@xxxxxxxxx> wrote:

> The proposed fix is non-trivial, and changes not just the new path
> that observes the issue (MSG_SPLICE_PAGES), but also the other more
> common paths that exercise __ip6_append_data.

I realise that. I broke ping/ping6 briefly, but I corrected that (I
subtracted the ICMP header len from length after copying it out, but forgot
that it needed adding back on for the return value of sendmsg()). But I don't
think there are that many callers - however, you might be right that this is
too big for a fix.

> There is significant risk to introduce an unintended side effect
> requiring a follow-up fix. Because this function is notoriously
> complex, multiplexing a lot of behavior: with and without transport
> headers, edge cases like fragmentation, MSG_MORE, absence of
> scatter-gather, ....

The problem is that the bug isn't in __ip{,6}_append_data(), I think, it's
actually higher up in ip{,6}_append_data(). I think I see *why* length has
transhdrlen handed into it: because ping and raw sockets come with that
pre-added-in by userspace.

I would actually like to eliminate the length argument entirely and use the
length in the iterator - but that doesn't work in all cases as sometimes there
isn't a msghdr struct. (And, besides, that's too big a change for a fix).

I think the simplest fix, then, is just to make ip{,6}_append_data() subtract
transhdrlen from length before clearing transhdrlen when there's already a
packet in the queue from MSG_MORE/cork that will be appended to.

> Does the issue discovered only affect MSG_SPLICE_PAGES or can it
> affect other paths too? If the first, it possible to create a more
> targeted fix that can trivially be seen to not affect code prior to
> introduction of splice pages?

It may also affect MSG_ZEROCOPY in interesting ways. msg_zerocopy_realloc()
looks suspicious as it does things with 'size' bytes from the buffer that
doesn't have 'size' bytes of data in it (because size (aka length) includes
transhdrlen).

I would guess that we don't notice issues with ping sockets because people
don't often use MSG_MORE/corking with them.

Raw sockets shouldn't exhibit this bug as they set transhdrlen to 0 up front,
but I can't help but wonder what the consequences are as some bits of
__ip*_append_data() change behaviour if they see transhdrlen==0 :-/

David