Kernel 5.0-rc5 regression with NAT, bisected to: netfilter: nat: remove l4proto->manip_pkt

From: Sander Eikelenboom
Date: Thu Feb 07 2019 - 17:39:27 EST


L.S.,

While trying out a 5.0-RC5 kernel I seem to have stumbled over a regression with NAT.
(using an nftables firewall with NAT and connection tracking).

Unfortunately it isn't too obvious since no errors are logged, but on clients it
causes symptoms like firefox intermittently not being able to load pages with:
Network Protocol Error
An error occurred during a connection to www.example.com
The page you are trying to view cannot be shown because an error in the network protocol was detected.
Please contact the website owners to inform them of this problem.

But it's only intermittently, so i can still visit some webpages with clients,
could be that packet size and or fragments are at play ?

So I tried testing with git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git with
e8c32c32b48c2e889704d8ca0872f92eb027838e as last commit, to be sure to have the latest netdev has to offer,
but to no avail.

After that I tried to git bisect and ended up with:

faec18dbb0405c7d4dda025054511dc3a6696918 is the first bad commit
commit faec18dbb0405c7d4dda025054511dc3a6696918
Author: Florian Westphal <fw@xxxxxxxxx>
Date: Thu Dec 13 16:01:33 2018 +0100

netfilter: nat: remove l4proto->manip_pkt

This removes the last l4proto indirection, the two callers, the l3proto
packet mangling helpers for ipv4 and ipv6, now call the
nf_nat_l4proto_manip_pkt() helper.

nf_nat_proto_{dccp,tcp,sctp,gre,icmp,icmpv6} are left behind, even though
they contain no functionality anymore to not clutter this patch.

Next patch will remove the empty files and the nf_nat_l4proto
struct.

nf_nat_proto_udp.c is renamed to nf_nat_proto.c, as it now contains the
other nat manip functionality as well, not just udp and udplite.

Signed-off-by: Florian Westphal <fw@xxxxxxxxx>
Signed-off-by: Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx>

:040000 040000 22d8706921e03cbd6d78a6ebcc5f253ccfd2bf0c b6f8ab2779215b4495dfe641f50e798da73859ac M include
:040000 040000 af212a756f1acf00cbe45c3be5b71f38f01f1d34 165c440f9e6f2e05738628a19b51f7603f95752a M net

Any ideas or debugging hints ?

--
Sander