Re: [GIT PULL] dlm fixes for 6.9

From: David Teigland
Date: Fri Mar 15 2024 - 14:43:12 EST


On Fri, Mar 15, 2024 at 10:10:00AM -0700, Linus Torvalds wrote:
> Now, if the issue is that you want to clean up something that is never
> getting cleaned up by anybody else, and this is a fatal error, and
> you're just trying to fix things up (badly), and you know that this is
> all racy but the code is trying to kill a dead data structure, then
> you should
>
> (a) need a damn big comment (bigger than the comment is already)
>
> (b) should *NOT* pretend to do some stupid "atomic decrement and test" loop

Yes, that looks pretty messed up, the counter should not be an atomic_t. I was
a bit wary of making that atomic when it wasn't necessary, but didn't push back
enough on that change:

commit 75a7d60134ce84209f2c61ec4619ee543aa8f466
Author: Alexander Aring <aahringo@xxxxxxxxxx>
Date: Mon May 29 17:44:38 2023 -0400

Currently the lkb_wait_count is locked by the rsb lock and it should be
fine to handle lkb_wait_count as non atomic_t value. However for the
overall process of reducing locking this patch converts it to an
atomic_t value.

.. and the result is the primitives get abused, and the code becomes crazy.
My initial plan is to go back to a non-atomic counter there. It is indeed a
recovery situation that involves a forced reset of state, but I'll need to go
back and study that case further before I can say what it should finally look
like. Whatever that looks like, it'll have a very good comment :) Dropping
the pull is fine, there's a chance I may resend with the other patch and a new
fix, we'll see.

Thanks,
Dave