Re: mm: deadlock between get_online_cpus/pcpu_alloc

From: Michal Hocko
Date: Thu Feb 09 2017 - 14:17:52 EST


On Thu 09-02-17 11:22:49, Cristopher Lameter wrote:
> On Thu, 9 Feb 2017, Thomas Gleixner wrote:
>
> > You are just not getting it, really.
> >
> > The problem is that this for_each_online_cpu() is racy against a concurrent
> > hot unplug and therefor can queue stuff for a not longer online cpu. That's
> > what the mm folks tried to avoid by preventing a CPU hotplug operation
> > before entering that loop.
>
> With a stop machine action it is NOT racy because the machine goes into a
> special kernel state that guarantees that key operating system structures
> are not touched. See mm/page_alloc.c's use of that characteristic to build
> zonelists. Thus it cannot be executing for_each_online_cpu and related
> tasks (unless one does not disable preempt .... but that is a given if a
> spinlock has been taken)..

Christoph, you are completely ignoring the reality and the code. There
is no need for stop_machine nor it is helping anything. As the matter
of fact there is a synchronization with the cpu hotplug needed if you
want to make a per-cpu specific operations. get_online_cpus is the
most straightforward and heavy weight way to do this synchronization
but not the only one. As the patch [1] describes we do not really need
get_online_cpus in drain_all_pages because we can do _better_. But
this is not in any way a generic thing applicable to other code paths.

If you disagree then you are free to post patches but hand waving you
are doing here is just wasting everybody's time. So please cut it here
unless you have specific proposals to improve the current situation.

Thanks!

[1] http://lkml.kernel.org/r/20170207201950.20482-1-mhocko@xxxxxxxxxx
--
Michal Hocko
SUSE Labs