Re: [PATCH 4.9 50/71] inet: frags: use rhashtables for reassembly units

From: Stefan Schmidt
Date: Fri Oct 26 2018 - 09:39:56 EST


Hello Greg.

[Hope I am not to late for this]

On 16/10/2018 19:09, Greg Kroah-Hartman wrote:
> 4.9-stable review patch. If anyone has any objections, please let me know.
>
> ------------------
>
> From: Eric Dumazet <edumazet@xxxxxxxxxx>
>
> Some applications still rely on IP fragmentation, and to be fair linux
> reassembly unit is not working under any serious load.
>
> It uses static hash tables of 1024 buckets, and up to 128 items per bucket (!!!)
>
> A work queue is supposed to garbage collect items when host is under memory
> pressure, and doing a hash rebuild, changing seed used in hash computations.
>
> This work queue blocks softirqs for up to 25 ms when doing a hash rebuild,
> occurring every 5 seconds if host is under fire.
>
> Then there is the problem of sharing this hash table for all netns.
>
> It is time to switch to rhashtables, and allocate one of them per netns
> to speedup netns dismantle, since this is a critical metric these days.
>
> Lookup is now using RCU. A followup patch will even remove
> the refcount hold/release left from prior implementation and save
> a couple of atomic operations.
>
> Before this patch, 16 cpus (16 RX queue NIC) could not handle more
> than 1 Mpps frags DDOS.
>
> After the patch, I reach 9 Mpps without any tuning, and can use up to 2GB
> of storage for the fragments (exact number depends on frags being evicted
> after timeout)
>
> $ grep FRAG /proc/net/sockstat
> FRAG: inuse 1966916 memory 2140004608
>
> A followup patch will change the limits for 64bit arches.
>
> Signed-off-by: Eric Dumazet <edumazet@xxxxxxxxxx>
> Cc: Kirill Tkhai <ktkhai@xxxxxxxxxxxxx>
> Cc: Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx>
> Cc: Florian Westphal <fw@xxxxxxxxx>
> Cc: Jesper Dangaard Brouer <brouer@xxxxxxxxxx>
> Cc: Alexander Aring <alex.aring@xxxxxxxxx>
> Cc: Stefan Schmidt <stefan@xxxxxxxxxxxxxxx>
> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx>
> (cherry picked from commit 648700f76b03b7e8149d13cc2bdb3355035258a9)
> Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>
> ---
> Documentation/networking/ip-sysctl.txt | 7
> include/net/inet_frag.h | 81 +++----
> include/net/ipv6.h | 16 -
> net/ieee802154/6lowpan/6lowpan_i.h | 26 --
> net/ieee802154/6lowpan/reassembly.c | 91 +++-----
> net/ipv4/inet_fragment.c | 349 ++++++--------------------------
> net/ipv4/ip_fragment.c | 112 ++++------
> net/ipv6/netfilter/nf_conntrack_reasm.c | 51 +---
> net/ipv6/reassembly.c | 110 ++++------
> 9 files changed, 267 insertions(+), 576 deletions(-)
>

When this patch hit master a while back we had to address a regression
in the ieee802514 6lowpan layer. It seems this fix is missing in the
backport series (only looking at your patchset here, no the full tree).

https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=f18fa5de5ba7f1d6650951502bb96a6e4715a948

I would appreciate if you could pull this into this series as well.

regards
Stefan Schmidt