Re: [PATCH] kexec: change locking mechanism to a mutex

From: Valentin Schneider
Date: Fri Sep 22 2023 - 12:29:10 EST


On 21/09/23 17:59, Eric DeVolder wrote:
> The design decision to use the atomic lock is described in the comment
> from kexec_internal.h, cited above. However, examining the code of
> __crash_kexec():
>
> if (kexec_trylock()) {
> if (kexec_crash_image) {
> ...
> }
> kexec_unlock();
> }
>
> reveals that the use of kexec_trylock() here is actually a "best effort"
> due to the atomic lock. This atomic lock, prior to crash hotplug,
> would almost always be assured (another kexec syscall could hold the lock
> and prevent this, but that is about it).
>
> So at the point where the capture kernel would be invoked, if the lock
> is not obtained, then kdump doesn't occur.
>
> It is possible to instead use a mutex with proper waiting, and utilize
> mutex_trylock() as the "best effort" in __crash_kexec(). The use of a
> mutex then avoids all the lock acquisition problems that were revealed
> by the crash hotplug activity.
>

@Dave thanks for the Cc, I'd have missed this otherwise.


Prior to the atomic thingie, we actually had a mutex and did
mutex_trylock() in __crash_kexec(). I'm a bit confused as this looks like a
revert of
05c6257433b7 ("panic, kexec: make __crash_kexec() NMI safe")
with just the helpers kept in - this doesn't seem to address any of the
original issues regarding NMIs?

Sebastian raised some good points in [1] regarding these issues.
The main hurdle pointed out there is, if we end up in the slowpath during
the unlock, then we can can up acquiring the ->wait_lock which isn't NMI
safe.

This is even worse on PREEMPT_RT, as both trylock and the unlock can end up
acquiring the ->wait_lock.

[1]: https://lore.kernel.org/all/YqyZ%2FUf14qkYtMDX@xxxxxxxxxxxxx/