[PATCH net v2] ipv4, ipv6: Fix handling of transhdrlen in __ip{,6}_append_data()

From: David Howells
Date: Wed Sep 20 2023 - 04:40:04 EST


Including the transhdrlen in length is a problem when the packet is
partially filled (e.g. something like send(MSG_MORE) happened previously)
when appending to an IPv4 or IPv6 packet as we don't want to repeat the
transport header or account for it twice. This can happen under some
circumstances, such as splicing into an L2TP socket.

The symptom observed is a warning in __ip6_append_data():

WARNING: CPU: 1 PID: 5042 at net/ipv6/ip6_output.c:1800 __ip6_append_data.isra.0+0x1be8/0x47f0 net/ipv6/ip6_output.c:1800

that occurs when MSG_SPLICE_PAGES is used to append more data to an already
partially occupied skbuff. The warning occurs when 'copy' is larger than
the amount of data in the message iterator. This is because the requested
length includes the transport header length when it shouldn't. This can be
triggered by, for example:

sfd = socket(AF_INET6, SOCK_DGRAM, IPPROTO_L2TP);
bind(sfd, ...); // ::1
connect(sfd, ...); // ::1 port 7
send(sfd, buffer, 4100, MSG_MORE);
sendfile(sfd, dfd, NULL, 1024);

Fix this by deducting transhdrlen from length in ip{,6}_append_data() right
before we clear transhdrlen if there is already a packet that we're going
to try appending to.

Reported-by: syzbot+62cbf263225ae13ff153@xxxxxxxxxxxxxxxxxxxxxxxxx
Link: https://lore.kernel.org/r/0000000000001c12b30605378ce8@xxxxxxxxxx/
Signed-off-by: David Howells <dhowells@xxxxxxxxxx>
cc: Eric Dumazet <edumazet@xxxxxxxxxx>
cc: Willem de Bruijn <willemdebruijn.kernel@xxxxxxxxx>
cc: "David S. Miller" <davem@xxxxxxxxxxxxx>
cc: David Ahern <dsahern@xxxxxxxxxx>
cc: Paolo Abeni <pabeni@xxxxxxxxxx>
cc: Jakub Kicinski <kuba@xxxxxxxxxx>
cc: netdev@xxxxxxxxxxxxxxx
cc: bpf@xxxxxxxxxxxxxxx
cc: syzkaller-bugs@xxxxxxxxxxxxxxxx
Link: https://lore.kernel.org/r/75315.1695139973@xxxxxxxxxxxxxxxxxxxxxx/ # v1
---
net/ipv4/ip_output.c | 1 +
net/ipv6/ip6_output.c | 1 +
2 files changed, 2 insertions(+)

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 4ab877cf6d35..9646f2d9afcf 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -1354,6 +1354,7 @@ int ip_append_data(struct sock *sk, struct flowi4 *fl4,
if (err)
return err;
} else {
+ length -= transhdrlen;
transhdrlen = 0;
}

diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 54fc4c711f2c..6a4ce7f622e9 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1888,6 +1888,7 @@ int ip6_append_data(struct sock *sk,
length += exthdrlen;
transhdrlen += exthdrlen;
} else {
+ length -= transhdrlen;
transhdrlen = 0;
}