Re: AW: [PATCH] amd64: Fix csum_partial_copy_generic()

From: Al Viro
Date: Thu Oct 19 2023 - 02:39:34 EST


On Thu, Oct 19, 2023 at 07:14:27AM +0100, Al Viro wrote:
> On Thu, Oct 19, 2023 at 06:02:50AM +0100, Al Viro wrote:
> > On Thu, Oct 19, 2023 at 04:44:04AM +0000, gus Gusenleitner Klaus wrote:
> > > > On Wed, Oct 18, 2023 at 06:18:05AM +0000, gus Gusenleitner Klaus wrote:
> > > > > The checksum calculation is wrong in case of an source buffer
> > > > > containing zero bytes only. The expected return value is 0, the
> > > > > actual return value is 0xfffffff.
> > > >
> > > > Expected where? The actual checksum is defined modulo 0xffff, so
> > > > 0 and 0xffffffff represent the same final value.
> > > >
> > > > The only twist is that in some situations we internally use 0 for
> > > > "not calculated yet".
> > > >
> > > > > This problem occurs when a ICMP echo reply is sent that has set
> > > > > zero identifier, sequence number and data.
> > > >
> > > > What problem? Could you please either point to specific RFC or
> > > > show that packets are rejected by some existing system, or...?
> > >
> > > Here's our situation:
> > > Our device gets pinged by a third party manufacturer robot controller.
> > > We have updated the kernel in our device to 5.15 from 4.9, the robot
> > > controller is kept unchanged. At 4.9, our device's ping reply is accepted
> > > by the robot controller, at 5.15 it's not.
> > >
> > > Wireshark shows a bad checksum warning:
> > > 'Checksum: 0x0000 incorrect, should be 0xffff'
> > >
> >
> > Lovely. I think I see what's going on, give me a few to think about it...
>
> The real source of trouble was switching csum_and_copy_{to,from}_user()
> to reporting faults as 0. And yes, it's broken. Bugger...

I really hate the idea of bringing back the old horrors and splitting
_nocheck and _user variants ;-/ Especially since we don't care (and
never had, really) where in the EFAULT case had the damn thing faulted
and what csum had it managed to accumulate prior to that point.

The only callers are csum_and_copy_..._iter() and they discard
the entire iovec segment if fault happens; all users of
csum_and_copy_from_iter() are actually discarding everything in
that case (reverting iterator to the point where it had been
prior to the call).

And they are all thread-synchronous. Hell, it's tempting to steal
a thread flag, clear it before the call of those suckers, set it in
exception handlers in those and check in csum_and_copy_..._iter()
after the call... Let me see how ugly something of that sort would
be...