Re: [PATCH] 9p/client: fix data race on req->status

From: Marco Elver
Date: Mon Dec 05 2022 - 08:07:06 EST


On Mon, 5 Dec 2022 at 13:50, Dominique Martinet <asmadeus@xxxxxxxxxxxxx> wrote:
>
> KCSAN reported a race between writing req->status in p9_client_cb and
> accessing it in p9_client_rpc's wait_event.
>
> Accesses to req itself is protected by the data barrier (writing req
> fields, write barrier, writing status // reading status, read barrier,
> reading other req fields), but status accesses themselves apparently
> also must be annotated properly with WRITE_ONCE/READ_ONCE when we
> access it without locks.
>
> Follows:
> - error paths writing status in various threads all can notify
> p9_client_rpc, so these all also need WRITE_ONCE
> - there's a similar read loop in trans_virtio for zc case that also
> needs READ_ONCE
> - other reads in trans_fd should be protected by the trans_fd lock and
> lists state machine, as corresponding writers all are within trans_fd
> and should be under the same lock. If KCSAN complains on them we likely
> will have something else to fix as well, so it's better to leave them
> unmarked and look again if required.
>
> Reported-by: Naresh Kamboju <naresh.kamboju@xxxxxxxxxx>
> Suggested-by: Marco Elver <elver@xxxxxxxxxx>
> Signed-off-by: Dominique Martinet <asmadeus@xxxxxxxxxxxxx>

Acked-by: Marco Elver <elver@xxxxxxxxxx>

In case you're interested, KCSAN has a strict mode that is more
aggressive about which data races it reports (by default we're hiding
several classes of data races). One such class is data races due to
missing memory barriers, where e.g. an unmarked operation can be
reordered in such a way (by compiler or CPU) that a concurrent racy
access occurs. This mode can be enabled with CONFIG_KCSAN_STRICT=y.
It's most effective with some good stress tests for the subsystem of
interest. See https://docs.kernel.org/dev-tools/kcsan.html#modeling-weak-memory

Thanks,
-- Marco