Re: [PATCH 3/8] Use process freezer for cpu-hotplug

From: Oleg Nesterov
Date: Thu Apr 05 2007 - 06:54:43 EST


On 04/02, Gautham R Shenoy wrote:
>
> + if (freeze_processes(FE_HOTPLUG_CPU)) {
> + thaw_processes(FE_HOTPLUG_CPU);
> + return -EBUSY;
> + }

Off-topic. This is a common pattern. Perhaps it makes sense to
introduce a try_to_freeze_or_thaw_and_return_an_error() helper.

> @@ -161,10 +141,13 @@ static int _cpu_down(unsigned int cpu)
> hcpu) == NOTIFY_BAD)
> BUG();
>
> - if (IS_ERR(p)) {
> + set_cpus_allowed(current, old_allowed);
> +
> + if (IS_ERR(p))
> err = PTR_ERR(p);
> - goto out_allowed;
> - }
> + else
> + err = kthread_stop(p);
> +
> goto out_thread;
> }

Why this change? We are doing kthread_stop() + set_cpus_allowed() on
return. Imho,

if (IS_ERR(p))
goto out_allowed;
goto out_thread;

looks a bit better. Yes we need a couple of error labels at the end.

> --- linux-2.6.21-rc5.orig/kernel/softlockup.c
> +++ linux-2.6.21-rc5/kernel/softlockup.c
> @@ -147,6 +147,7 @@ cpu_callback(struct notifier_block *nfb,
> case CPU_DEAD:
> p = per_cpu(watchdog_task, hotcpu);
> per_cpu(watchdog_task, hotcpu) = NULL;
> + thaw_process(p);
> kthread_stop(p);

As it was already discussed, this is racy. As Srivatsa (imho rightly)
suggested, kthread_stop(p) should thaw process itself. This also allows
us to kill at least some of wait_for_die loops.

However, the change in kthread_stop(p) in not enough to close the race.
We can check kthread_should_stop() in refrigerator(), this looks like
a most simple approach for now.

Alternatively, Srivatsa suggests to introduce a new task_lock() protected
task_struct->freezer_state (so we can reliably set FE_ALL). Surely this is
more poweful, but needs more changes. I am not sure. Perhaps we can do
this later.

In any case, imho "try3" should add thaw_process() to kthread_stop().
Gautham, Srivatsa, do you agree?

Oleg.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/