Re: mm: deadlock between get_online_cpus/pcpu_alloc

From: Thomas Gleixner
Date: Thu Feb 09 2017 - 10:34:21 EST


On Thu, 9 Feb 2017, Christoph Lameter wrote:

> On Thu, 9 Feb 2017, Thomas Gleixner wrote:
>
> > And how does that solve the problem at hand? Not at all:
> >
> > CPU 0 CPU 1
> >
> > for_each_online_cpu(cpu)
> > ==> cpu = 1
> > stop_machine()
> > set_cpu_online(1, false)
> > queue_work(cpu1)
> >
> > Thanks,
>
> Well thats not how I remember stop_machine does work. Doesnt it stop the
> processing on all cpus otherwise its not a real usable stop.
>
> The stop_machine would need to ensure that all cpus cease processing
> before proceeding.

Ok. I try again:

CPU 0 CPU 1
for_each_online_cpu(cpu)
==> cpu = 1
stop_machine()

Stops processing on all CPUs by preempting the current execution and
forcing them into a high priority busy loop with interrupts disabled.

context_switch()
stomper_thread()
busyloop()

set_cpu_online(1, false)

stop_machine end()
release busy looping CPUs

context_switch

Resumes operation at the preemption point. cpu is still 1

queue_work(cpu == 1)

It does exactly what you describe. It stops processing on all other cpus
until release, but that does not invalidate any data on those cpus.

It's been that way forever.

Thanks,

tglx