Bug in disk event polling

From: Alan Stern
Date: Fri Feb 10 2012 - 15:31:20 EST


Tejun:

Don't ask me why this hasn't shown up earlier... There's a big fat bug
in the implementation of disk event polling.

The polling is done using the system_nrt_wq work queue, which isn't
freezable. As a result, polling continues while the system is
preparing for suspend or hibernation.

Obviously I/O to suspended devices doesn't work well. Somewhat less
obviously, error recovery for the failed I/O attempts can interfere
with normal system resume.

You can see this for yourself easily enough by suspending or
hibernating while a USB flash drive is plugged in. You don't even need
to go through the full suspend procedure; the first two stages are
enough (echo devices >/sys/power/pm_test). Check the system log
afterward; most likely you'll find the flash drive got errors and had
to be unregistered and re-enumerated.

I have verified that changing all occurrences of system_nrt_wq in
block/genhd.c to system_freezable_wq fixes the bug. However this may
not be the way you want to solve it; you may prefer to have a freezable
non-reentrant work queue.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/