Re: [mm 4.15-rc7] Random oopses under memory pressure.

From: Linus Torvalds
Date: Mon Jan 15 2018 - 18:05:27 EST


On Sun, Jan 14, 2018 at 3:54 AM, Tetsuo Handa
<penguin-kernel@xxxxxxxxxxxxxxxxxxx> wrote:
> This memory corruption bug occurs even on CONFIG_SMP=n CONFIG_PREEMPT_NONE=y
> kernel. This bug highly depends on timing and thus too difficult to bisect.
> This bug seems to exist at least since Linux 4.8 (judging from the traces, though
> the cause might be different). None of debugging configuration gives me a clue.
> So far only CONFIG_HIGHMEM=y CONFIG_DEBUG_PAGEALLOC=y kernel (with RAM enough to
> use HighMem: zone) seems to hit this bug, but it might be just by chance caused
> by timings. Thus, there is no evidence that 64bit kernels are not affected by
> this bug. But I can't narrow down any more. Thus, I call for developers who can
> narrow down / identify where the memory corruption bug is.

Hmm.

I guess I'm still hung up on the "it does not look like a valid
'struct page *'" thing.

Can you reproduce this with CONFIG_FLATMEM=y instead of CONFIG_SPARSEMEM?

Because if you can, I think we can easily add a few more pfn and
'struct page' validation debug statements. With SPARSEMEM, it gets
pretty complicated because the whole struct page setup is much more
complex.

Linus