[PATCH 0/2] PM / Usermodehelper: Fix freezer failures due to racyusermodehelper_is_disabled()

From: Srivatsa S. Bhat
Date: Mon Dec 05 2011 - 16:25:31 EST


Hi Tejun, Boris and Rafael,

I am revisiting the microcode vs freezer issue with this patchset, with
a much better solution this time (hopefully), since now I am attacking
the root cause of the problem (a race in usermodehelpers), in order
to solve the issue I reported earlier in:
http://thread.gmane.org/gmane.linux.kernel/1198291/focus=1200591

:-)

Btw, Rafael, using PM notifiers for mutual exclusion from freezer (like we
did in CPU hotplug case), for x86 microcode driver wouldn't work out, since
there is no guarantee that our callback will get registered before freezing
starts, since the module init code itself needs to be mutexed from the freezer.
And the callback registration would also have to happen from the init code
itself! (we didn't have this problem in CPU hotplug case).

And since Boris pointed out several times that in real world, microcode won't
be applied all that often, I decided to focus on the root-cause and leave
the microcode driver alone :-)

---
Commit a144c6a (PM: Print a warning if firmware is requested when tasks
are frozen) introduced usermodehelper_is_disabled() to warn and exit
immediately if firmware is requested when usermodehelpers are disabled.

However, it is racy. Consider the following scenario, currently used in
drivers/base/firmware_class.c:

...
if (usermodehelper_is_disabled())
goto out;

/* Do actual work */
...

out:
return err;

Nothing prevents someone from disabling usermodehelpers just after the
check in the 'if' condition, which means that it is quite possible to try
doing the "actual work" with usermodehelpers disabled, leading to undesirable
consequences.

In particular, this race condition in _request_firmware() causes task
freezing failures whenever suspend/hibernation is in progress because, it
wrongly waits to get the firmware/microcode image from userspace when actually
the usermodehelpers are disabled or userspace has been frozen. Some of the
example scenarios that cause freezing failures due to this race are those
that depend on userspace via request_firmware(), such as x86 microcode module
initialization and microcode image reload.

Previous discussions about this issue can be found at:
http://thread.gmane.org/gmane.linux.kernel/1198291/focus=1200591

This patchset adds proper synchronization to fix this issue.

It is to be noted that this patchset fixes the freezing failures but doesn't
remove the warnings. IOW, it does not attempt to add explicit synchronization
to x86 microcode driver to avoid requesting microcode image at inopportune
moments. Because, the warnings were introduced to highlight such cases, in
the first place. And we need not silence the warnings, since we take care
of the *real* problem (freezing failure) and hence, after that, the warnings
are pretty harmless anyway.

--
Srivatsa S. Bhat (2):
PM / Usermodehelper: Introduce reference counting to solve usermodehelper_disabled race
PM / request_firmware(): Use the refcounting solution to fix the race


drivers/base/firmware_class.c | 3 ++
include/linux/kmod.h | 2 +
kernel/kmod.c | 74 ++++++++++++++++++++++++++++++++++++++++-
3 files changed, 78 insertions(+), 1 deletions(-)


Regards,
Srivatsa S. Bhat
IBM Linux Technology Center

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/