Re: Suspend/resume - slow resume

From: Linus Torvalds
Date: Mon Apr 18 2011 - 14:49:58 EST


On Mon, Apr 18, 2011 at 11:08 AM, Francois Romieu <romieu@xxxxxxxxxxxxx> wrote:
> [...]
>>  - unload not on close, but on device unregister (iow not when you do
>> "ifconfig eth0 down", but when the "eth0" device really goes away)
>
> Without further action, the firmware(s) will thus be locked in until the
> driver is removed.

I do agree. It's a downside. Maybe doing it in "close()" is the right
thing, as long as we don't have that crazy "every four timer ticks"
situation with rtl8169_reinit_task.

As mentioned, the only real reason for me to be worried about the
close thing is that I don't have a good feel for what happens at boot
time. Are the setup scripts going to look at the interface lots of
times? On my desktop, I couldn't care less, but I try to keep boot
time in mind.

Maybe in practice there's just a single open at boot-time (for dhcp or
whatever), and I'm just worried for no good reason.

Without having looked at that whole rtl8169_reinit_task thing, I
probably wouldn't even worry about anything else doing something
similar ;)

> As long as it can be fixed... If the 60s delay is removed and the firmware
> loading emits some messages for programmer barbie, I am more than happy.

So the firmware loading timeout used to be ten seconds (which I
already think is excessive), but then commit
2f65168de7d68a5795e945e781d85b313bdc97b9 increased it to 60s because

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=174589

The ipw driver sometimes takes a long time to load its firmware.
Whilst the ipw driver should be using the async interface of
the firmware loader to make this a non-issue, this is a minimal fix.

although when I actually look at that bugzilla entry, the _timestamps_
for the failed case do not seem to support this being a timeout.

Very odd.

But the real problem is that we do that timeout even in cases where it
cannot help, ie when people load firmware during early boot or during
suspend. So I think drivers/base/firmware_class.c should be made a bit
smarter.

We have a few cases where call_usermodehelper() fails immediately:

- khelper_wq hasn't been set up yet
- usermodehelper_disabled is set.

and in particular, during suspend/resume, that
"usermodehelper_disabled" flag will be set.

I don't think it is sensible to do a user request for firmware during
that time either, and that 60-second timeout is just silly. It's not
going to help.

Why doesn't the firmware loader class check the error return from the
kobject_uevent()? I'd expect that if that fails, we should just warn
and abort, rather than wait 60 seconds to time out. Greg?

TOTALLY UNTESTED PATCH ATTACHED!

Ciprian - does this get rid of the 60-second wait? Do you get a nice
kernel traceback in your dmesg instead?

> If someone can tell me where Realtek's firmware should be sent to as David
> W. seems to be busy, it will be perfect.

Hmm. Dunno about that.

Linus
drivers/base/firmware_class.c | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c
index 8c798ef7f13f..956dd34e59da 100644
--- a/drivers/base/firmware_class.c
+++ b/drivers/base/firmware_class.c
@@ -549,7 +549,8 @@ static int _request_firmware(const struct firmware **firmware_p,
round_jiffies_up(jiffies +
loading_timeout * HZ));

- kobject_uevent(&fw_priv->dev.kobj, KOBJ_ADD);
+ if (WARN_ON(kobject_uevent(&fw_priv->dev.kobj, KOBJ_ADD)))
+ fw_load_abort(fw_priv);
}

wait_for_completion(&fw_priv->completion);