[RFC PATCH 0/2] Improve reliability of CPU hotplug

From: Mel Gorman
Date: Wed Jan 11 2012 - 05:11:14 EST


Recent stress tests doing CPU online/offline in a loop revealed at
least two separate bugs. They result in CPUs either being deadlocked on
a spinlock or the machine halting entirely. My reproduction case used
a large numbers of simultaneous kernel compiles on an 8-core machine
while CPUs were continually being brought online and offline in a
loop.

This small series includes two patches that allow hotplug stress tests
to pass for me when applied to 3.2. This does not claim to solve
all CPU hotplug problems. For example, the test configuration did
not have PREEMPT enabled but there is no harm in eliminating these
bugs at least.

Patch 1 looks at a sysfs dirent problem whereby under stress a dentry
lock is taken twice. This is a sysfs-specific test but a dcache
related fix also exists as an RFC.

Patch 2 notes that the page allocator is sending IPIs without calling
get_online_cpus() to protect the cpuonline mask from changes.
In low memory situations, this allows an IPI to be sent to a
CPU going offline. This patch fixes drain_all_pages() and then
changes the page allocator to only drain local lists with
preempt disabled instead of sending an IPI on the grounds the
IPI costs while having a marginal benefit.

fs/sysfs/dir.c | 4 ++--
mm/page_alloc.c | 16 ++++++++++++----
2 files changed, 14 insertions(+), 6 deletions(-)

--
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/