Re: + mm-fix-panic-in-__alloc_pages.patch added to -mm tree

From: Dennis Zhou
Date: Fri Nov 12 2021 - 13:20:34 EST


Hello,

On Tue, Nov 09, 2021 at 12:00:46PM +0100, Michal Hocko wrote:
> On Tue 09-11-21 09:42:56, David Hildenbrand wrote:
> > On 09.11.21 09:37, Michal Hocko wrote:
> > > I have opposed this patch http://lkml.kernel.org/r/YYj91Mkt4m8ySIWt@xxxxxxxxxxxxxx
> > > There was no response to that feedback. I will not go as far as to nack
> > > it explicitly because pcp allocator is not an area I would nack patches
> > > but seriously, this issue needs a deeper look rather than a paper over
> > > patch. I hope we do not want to do a similar thing to all callers of
> > > cpu_to_mem.
> >
> > While we could move it into the !HOLES version of cpu_to_mem(), calling
> > cpu_to_mem() on an offline (and eventually not even present) CPU (with
> > an offline node) is really a corner case.
> >
> > Instead of additional runtime overhead for all cpu_to_mem(), my take
> > would be to just do it for the random special cases. Sure, we can
> > document that people should be careful when calling cpu_to_mem() on
> > offline CPUs. But IMHO it's really a corner case.
>
> I suspect I haven't made myself clear enough. I do not think we should
> be touching cpu_to_mem/cpu_to_node and handle this corner case. We
> should be looking at the underlying problem instead. We cannot really
> rely on cpu to be onlined to have a proper node association. We should
> really look at the initialization code and handle this situation
> properly. Memory less nodes are something we have been dealing with
> already. This particular instance of the problem is new and we should
> understand why.
> --
> Michal Hocko
> SUSE Labs

So I think we're still short a solution here. This patch solves the side
effect but not the underlying problem related to cpu hotplug.

I'm fine with this going in as a stop gap because I imagine the fixes to
hotplug are a lot more intrusive, but do we have someone who can own
that work to fix hotplug? I think that should be a requirement for
taking this because clearly it's hotplug that's broken and not percpu.

Acked-by: Dennis Zhou <dennis@xxxxxxxxxx>

Thanks,
Dennis