Re: Lockups due to "locking/rwsem: Make handoff bit handling more consistent"

From: Waiman Long
Date: Tue Jun 21 2022 - 21:32:32 EST


On 6/20/22 10:09, Mel Gorman wrote:
On Fri, Jun 17, 2022 at 10:29:20AM -0400, Waiman Long wrote:
The C file and shell script to run it are attached.

Thanks for the reproducer and I will try to reproduce it locally.

It is a known issue that I have receive similar report from an Oracle
engineer. That is the reason I posted commit 1ee326196c66 ("locking/rwsem:
Always try to wake waiters in out_nolock path") that was merged in v5.19. I
believe it helps but it may not be able to eliminate all possible race
conditions. To make rwsem behave more like before commit d257cc8cb8d5
("locking/rwsem: Make handoff bit handling more consistent"), I posted a
follow-up patch

https://lore.kernel.org/lkml/20220427173124.1428050-1-longman@xxxxxxxxxx/

But it hasn't gotten review yet.

FWIW, the patch passed the test case when applied to both 5.18 and
5.19-rc3.

Thanks for running the test. Do you mean that both 5.18 and 5.19-rc3 fail the test and they pass only after applying the patch?

Anyway, I am not able to reproduce the failure in both 5.18 and 5.19-rc3. Perhaps it is due to the difference in the running environment, i.e. gcc, glibc, etc. What operating environment (SuSE version) do you use to reproduce the failure? I used RHEL8 which is the most convenient one for me.

BTW, do you mind if I put down your name with a "Tested-by:" tag to the patch?

Thanks,
Longman