Re: [PATCH] firmware_loader: Add reboot_in_progress for user helper path

From: Mukesh Ojha
Date: Tue Sep 19 2023 - 03:58:38 EST




On 9/17/2023 3:42 PM, Greg KH wrote:
On Fri, Sep 15, 2023 at 03:51:28PM +0530, Mukesh Ojha wrote:
There could be following scenario where there is a ongoing reboot
is going from processA which tries to call all the reboot notifier
callback and one of them is firmware reboot call which tries to
abort all the ongoing firmware userspace request under fw_lock
but there could be another processB which tries to do request
firmware, which came just after abort done from ProcessA and
ask for userspace to load the firmware and this can stop the
ongoing reboot ProcessA to stall for next 60s(default timeout)
which may be expected behaviour everyone like to see, instead
we should abort every request which came after once firmware
marks reboot notification.

ProcessA ProcessB

kernel_restart_prepare
blocking_notifier_call_chain
fw_shutdown_notify
kill_pending_fw_fallback_reqs
__fw_load_abort
fw_state_aborted request_firmware
__fw_state_set firmware_fallback_sysfs
... fw_load_from_user_helper
.. ...
. ..
usermodehelper_read_trylock
fw_load_sysfs_fallback
fw_sysfs_wait_timeout
usermodehelper_disable
__usermodehelper_disable
down_write()

Signed-off-by: Mukesh Ojha <quic_mojha@xxxxxxxxxxx>
---
drivers/base/firmware_loader/fallback.c | 2 +-
drivers/base/firmware_loader/firmware.h | 1 +
drivers/base/firmware_loader/main.c | 2 ++
3 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/base/firmware_loader/fallback.c b/drivers/base/firmware_loader/fallback.c
index bf68e3947814..a5546aeea91f 100644
--- a/drivers/base/firmware_loader/fallback.c
+++ b/drivers/base/firmware_loader/fallback.c
@@ -86,7 +86,7 @@ static int fw_load_sysfs_fallback(struct fw_sysfs *fw_sysfs, long timeout)
}
mutex_lock(&fw_lock);
- if (fw_state_is_aborted(fw_priv)) {
+ if (reboot_in_progress || fw_state_is_aborted(fw_priv)) {
mutex_unlock(&fw_lock);
retval = -EINTR;
goto out;

What prevents reboot_in_progress to change right after you check it
here?

e.g, reboot_in_progress was false, it gets added to the pending list
under fw_lock

list_add(&fw_priv->pending_list, &pending_fw_head);

reboot_in_progress = true, when all the outstanding fw request
are aborted during from reboot thread from below path. However,
I realize my mistake, reboot_in_progress should be modified
under fw_lock, will fix in v2.

>> fw_shutdown_notify
>> kill_pending_fw_fallback_reqs

So, idea is to revoke any fw request once firmware knows about ongoing
reboot and not delay the reboot process further for next default
60s .


And what kernel driver is trying to call the reboot notifier that gets
mixed up in this? Why not fix that driver to not need the reboot
notifier at all (hint, I really doubt it needs it...)

'drivers/base/firmware_loader/main.c' has reboot notifier which aborts
the ongoing firmware requests and but can race with other parallel
ongoing request which are not yet added to pending list.

fw_load_sysfs_fallback-> stuck waiting for fw_lock which was held by

kill_pending_fw_fallback_reqs=>
mutex_lock(&fw_lock);
__fw_load_abort



--- a/drivers/base/firmware_loader/main.c
+++ b/drivers/base/firmware_loader/main.c
@@ -93,6 +93,7 @@ static inline struct fw_priv *to_fw_priv(struct kref *ref)
DEFINE_MUTEX(fw_lock);
struct firmware_cache fw_cache;
+bool reboot_in_progress;

Bad global name for a variable in the firmware_loader core.

bool abort_fw_load_req ??

-Mukesh

thanks,

greg k-h