Re: arm64: allmodconfig: BUG: KCSAN: data-race in p9_client_cb / p9_client_rpc

From: Dominique Martinet
Date: Sat Dec 03 2022 - 18:10:24 EST


Marco Elver wrote on Sat, Dec 03, 2022 at 05:46:46PM +0100:
> > But I can't really find a problem with what KCSAN complains about --
> > we are indeed accessing status from two threads without any locks.
> > Instead of a lock, we're using a barrier so that:
> > - recv thread/cb: writes to req stuff || write to req status
> > - p9_client_rpc: reads req status || reads other fields from req
> >
> > Which has been working well enough (at least, without the barrier things
> > blow up quite fast).
> >
> > So can I'll just consider this a false positive, but if someone knows
> > how much one can read into this that'd be appreciated.
>
> The barriers only ensure ordering, but not atomicity of the accesses
> themselves (for one, the compiler is well in its right to transform
> plain accesses in ways that the concurrent algorithm wasn't designed
> for). In this case it looks like it's just missing
> READ_ONCE()/WRITE_ONCE().

Aha! Thanks for this!

I've always believed plain int types accesses are always atomic and the
only thing to watch for would be compilers reordering instrucions, which
would be ensured by the barrier in this case, but I guess there are some
architectures or places where this isn't true?


I'm a bit confused though, I can only see five places where wait_event*
functions use READ_ONCE and I believe they more or less all would
require such a marker -- I guess non-equality checks might be safe
(waiting for a value to change from a known value) but if non-atomic
updates are on the table equality and comparisons checks all would need
to be decorated with READ_ONCE; afaiu, unlike usespace loops with
pthread_cond_wait there is nothing protecting the condition itself.

Should I just update the wrapped condition, as below?

- err = wait_event_killable(req->wq, req->status >= REQ_STATUS_RCVD);
+ err = wait_event_killable(req->wq,
+ READ_ONCE(req->status) >= REQ_STATUS_RCVD);

The writes all are straightforward, there's all the error paths to
convert to WRITE_ONCE too but that's not difficult (leaving only the
init without such a marker); I'll send a patch when you've confirmed the
read looks good.
(the other reads are a bit less obvious as some are protected by a lock
in trans_fd, which should cover all cases of possible concurrent updates
there as far as I can see, but this mixed model is definitely hard to
reason with... Well, that's how it was written and I won't ever have time
to rewrite any of this. Enough ranting.)


> A (relatively) quick primer on the kernel's memory model and
> where/what/how we need to "mark" accesses:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/memory-model/Documentation/access-marking.txt

I read Documentation/memory-barriers.txt ages ago but wasn't aware of
this memory-model directory; I've skimmed through and will have a proper
read as time permits.

Thank you,
--
Dominique