Re: [PATCH net-next 00/11] UDP/IPv6 refactoring

From: Pavel Begunkov
Date: Thu Apr 28 2022 - 11:03:45 EST


On 4/28/22 15:04, Paolo Abeni wrote:
On Thu, 2022-04-28 at 11:56 +0100, Pavel Begunkov wrote:
Refactor UDP/IPv6 and especially udpv6_sendmsg() paths. The end result looks
cleaner than it was before and the series also removes a bunch of instructions
and other overhead from the hot path positively affecting performance.

It was a part of a larger series, there were some perf numbers for it, see
https://lore.kernel.org/netdev/cover.1648981570.git.asml.silence@xxxxxxxxx/

Pavel Begunkov (11):
ipv6: optimise ipcm6 cookie init
udp/ipv6: refactor udpv6_sendmsg udplite checks
udp/ipv6: move pending section of udpv6_sendmsg
udp/ipv6: prioritise the ip6 path over ip4 checks
udp/ipv6: optimise udpv6_sendmsg() daddr checks
udp/ipv6: optimise out daddr reassignment
udp/ipv6: clean up udpv6_sendmsg's saddr init
ipv6: partially inline fl6_update_dst()
ipv6: refactor opts push in __ip6_make_skb()
ipv6: improve opt-less __ip6_make_skb()
ipv6: clean up ip6_setup_cork

include/net/ipv6.h | 24 +++----
net/ipv6/datagram.c | 4 +-
net/ipv6/exthdrs.c | 15 ++--
net/ipv6/ip6_output.c | 53 +++++++-------
net/ipv6/raw.c | 8 +--
net/ipv6/udp.c | 158 ++++++++++++++++++++----------------------
net/l2tp/l2tp_ip6.c | 8 +--
7 files changed, 122 insertions(+), 148 deletions(-)

Just a general comment here: IMHO the above diffstat is quite
significant and some patches looks completely non trivial to me.

I think we need a quite significant performance gain to justify the
above, could you please share your performace data, comprising the
testing scenario?

As mentioned I benchmarked it with a UDP/IPv6 max throughput kind of
test and only as a part of a larger series [1]. It was "2090K vs
2229K tx/s, +6.6%". Taking into account +3% from split out sock_wfree
optimisations, half if not most of the rest should be accounted to this
series, so a bit hand-wavingly +1-3%. Can spend some extra time
retesting this particular series if strongly required...

I was using [2], which is basically an io_uring copy of send paths of
selftests/net/msg_zerocopy. Should be visible with other tools, this
one just alleviates context switch / etc. overhead with io_uring.

./send-zc -6 udp -D <address> -t <time> -s16 -z0

It sends a number of 16 bytes UDP/ipv6 (non-zerocopy) send requests over
io_uring, then waits for them and repeats. It was 8 (default) requests
per iteration (i.e. syscall). I was using dummy netdev, so there is no
actual receiver, but it quite correlates with my server setup with mlx
cards, just takes more effort for me to test. And all with
mitigations=off

There might be some fatter targets to optimise, but udpv6_sendmsg()
and functions around take a good chunk of cycles as well, though without
particular hotspots. If we'd want some better justification than 1-3%,
then need to add more work on top, adding even more to diffstat...
vicious cycle.


[1] https://lore.kernel.org/netdev/cover.1648981570.git.asml.silence@xxxxxxxxx/
[2] https://github.com/isilence/liburing/blob/zc_v3/test/send-zc.c

--
Pavel Begunkov