Re: kernel panic due to https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2830bf6f05fb3e05bc4743274b806c821807a684

From: robert shteynfeld
Date: Fri Jan 25 2019 - 10:52:21 EST


The person who pointed to mm/page_alloc.c commits likely causing the
issue did not have time to build a patched/reverted kernel to confirm
his hypothesis. When I tried backing out the two separate commits he
suggested, the first commit (ie. the one in the subject) was the one
that when backed out fixed the boot issue. Reverting the second one
had no effect.

On Fri, Jan 25, 2019 at 3:29 AM Michal Hocko <mhocko@xxxxxxxxxx> wrote:
>
> On Fri 25-01-19 09:19:24, Michal Hocko wrote:
> > On Fri 25-01-19 08:37:04, Michal Hocko wrote:
> > > On Fri 25-01-19 17:48:32, Linus Torvalds wrote:
> > > > [ Just adding a lot of other people to the cc ]
> > > >
> > > > Robert, could you add a dmesg of a successful boot to that bugzilla,
> > > > or just as an attachement in email to this group of people..
> > > >
> > > > This looks to be with the Fedora kernel config. Two people reporting
> > > > it, it looks like similar machines.
> > > >
> > > > I assume it's some odd memory sizing detail that happens to trigger a
> > > > particular case.
> > >
> > > Quite possible.
> >
> > Forgot to ask. Can we get a dmesg with 2830bf6f05fb ("mm,
> > memory_hotplug: initialize struct pages for the full memory section")
> > reverted and memblock=debug kernel command line parameter?
>
> And one more thing which I have overlook until now and it is not really
> clear to me. One of th comments says
> : The relevant part was:
> : kernel bug at mm/page_alloc.c=790
>
> I suppose this is 4.19 stable kernel because that would be
> VM_BUG_ON_PAGE(pfn & ((1 << order) - 1), page);
>
> in __free_one_page. I do not really see how 2830bf6f05fb could make any
> difference here. It simply zeroes out the rest of the mem section and
> that is guaranteed to be allocated because we do not do subsections. The
> above VM_BUG_ON says that we start allocating an unaligned pfn for its
> order.
>
> Or are there two issues reported in that bug?
> --
> Michal Hocko
> SUSE Labs