[Regression] Suspend failure on NForce4-based boards due to chanes in stop_machine

From: Rafael J. Wysocki
Date: Sun Nov 02 2008 - 19:25:42 EST


Hi,

Current mainline (.28-rc3 at the moment) fails to suspend on at least some
machines with dual-core AMD CPUs and NForce4-based mainboards. The affected
boxes just hang solid during suspend/hibernation, after suspending devices,
probably while the non-boot CPUs are being stopped (I'm able to reproduce this
on two different machines).

Although this is not reproducible 100% of the time, it is reproducible enough
to allow me to carry out bisection, which turned up the following commit as the
source of the problem:

commit c9583e55fa2b08a230c549bd1e3c0bde6c50d9cc
Author: Heiko Carstens <heiko.carstens@xxxxxxxxxx>
Date: Mon Oct 13 23:50:10 2008 +0200

stop_machine: use workqueues instead of kernel threads

Convert stop_machine to a workqueue based approach. Instead of using kernel
threads for stop_machine we now use a an rt workqueue to synchronize all
cpus.
This has the advantage that all needed per cpu threads are already created
when stop_machine gets called. And therefore a call to stop_machine won't
fail anymore. This is needed for s390 which needs a mechanism to synchronize
all cpus without allocating any memory.
As Rusty pointed out free_module() needs a non-failing stop_machine interface
as well.

As a side effect the stop_machine code gets simplified.

Signed-off-by: Heiko Carstens <heiko.carstens@xxxxxxxxxx>
Signed-off-by: Rusty Russell <rusty@xxxxxxxxxxxxxxx>

With this commit reverted, suspend/hibernation works on the affected machines
without any problems.

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/