Re: [PATCH] kexec: change locking mechanism to a mutex

From: Eric DeVolder
Date: Fri Sep 22 2023 - 13:35:51 EST




On 9/22/23 11:28, Valentin Schneider wrote:
On 21/09/23 17:59, Eric DeVolder wrote:
The design decision to use the atomic lock is described in the comment
from kexec_internal.h, cited above. However, examining the code of
__crash_kexec():

if (kexec_trylock()) {
if (kexec_crash_image) {
...
}
kexec_unlock();
}

reveals that the use of kexec_trylock() here is actually a "best effort"
due to the atomic lock. This atomic lock, prior to crash hotplug,
would almost always be assured (another kexec syscall could hold the lock
and prevent this, but that is about it).

So at the point where the capture kernel would be invoked, if the lock
is not obtained, then kdump doesn't occur.

It is possible to instead use a mutex with proper waiting, and utilize
mutex_trylock() as the "best effort" in __crash_kexec(). The use of a
mutex then avoids all the lock acquisition problems that were revealed
by the crash hotplug activity.


@Dave thanks for the Cc, I'd have missed this otherwise.


Prior to the atomic thingie, we actually had a mutex and did
mutex_trylock() in __crash_kexec(). I'm a bit confused as this looks like a
revert of
05c6257433b7 ("panic, kexec: make __crash_kexec() NMI safe")
with just the helpers kept in - this doesn't seem to address any of the
original issues regarding NMIs?

Sebastian raised some good points in [1] regarding these issues.
The main hurdle pointed out there is, if we end up in the slowpath during
the unlock, then we can can up acquiring the ->wait_lock which isn't NMI
safe.

This is even worse on PREEMPT_RT, as both trylock and the unlock can end up
acquiring the ->wait_lock.

[1]: https://lore.kernel.org/all/YqyZ%2FUf14qkYtMDX@xxxxxxxxxxxxx/

Having reviewed the references, it would seem that Baoquan's approach of a new
lock to handle the hotplug activity is the way to go?
Eric