Re: [PATCH 0/3] patches for stop_machine

From: Pavel Machek
Date: Fri May 02 2008 - 16:35:12 EST


Hi!

> Hi Rusty and all,
>
> This is a proposal of minor improvement for kernel/stop_machine.c
>
> [PATCH 1/3] stop_machine: short exit path for if we cannot create enough threads
> [PATCH 2/3] stop_machine: add timeout for child thread deployment
> [PATCH 3/3] stop_machine: add stopmachine_timeout sysctl entry
>
> The main topic is "how about adding timeout for stop_machine?"
> I think it will act as a safety net.
>
> For example (of silly situation), system can hung with following way:
>
> # ./silly.sh
> run an evil loop task on AP
> pid 6138's current affinity mask: ff
> pid 6138's new affinity mask: fe
> to pretend lock up, chrt -f -p 99 6138
> loop[6138] is on CPU #4
> to do stopmachine, try to off #7
> echo 0 > /sys/devices/system/cpu/cpu7/online
> (never return)
>
> After applying patch set here, it can be prevented.
>
> # ./silly.sh
> :
> echo 0 > /sys/devices/system/cpu/cpu7/online
> stopmachine: Failed to stop machine in time(5s). Are there any CPUs on file?
> ./silly.sh: line 22: echo: write error: Device or resource busy
> offline is failed

I'd expect at least WARN_ON here. -EBUSY is not good enough indication
that one of your cpus is now dead.

Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/