Re: Regression on ARMs in next-20170531

From: Johannes Weiner
Date: Sun Jun 04 2017 - 07:33:42 EST


On Wed, May 31, 2017 at 06:43:33PM +0100, Russell King - ARM Linux wrote:
> On Wed, May 31, 2017 at 09:45:45AM -0700, Tony Lindgren wrote:
> > Mark Brown noticed that the so far the only booting
> > ARMs are all with CONFIG_SMP disabled and I just
> > confirmed that's the case.
>
> > 8< --------------------
> > Unable to handle kernel paging request at virtual address 2e116007
> > pgd = c0004000
> > [2e116007] *pgd=00000000
> > Internal error: Oops: 5 [#1] SMP ARM
> > Modules linked in:
> > CPU: 0 PID: 0 Comm: swapper Not tainted 4.12.0-rc3-00153-gb6bc6724488a #200
> > Hardware name: Generic DRA74X (Flattened Device Tree)
> > task: c0d0adc0 task.stack: c0d00000
> > PC is at __mod_node_page_state+0x2c/0xc8
> > LR is at __per_cpu_offset+0x0/0x8
> > pc : [<c0271de8>] lr : [<c0d07da4>] psr: 600000d3
> > sp : c0d01eec ip : 00000000 fp : c15782f4
> > r10: 00000000 r9 : c1591280 r8 : 00004000
> > r7 : 00000001 r6 : 00000006 r5 : 2e116000 r4 : 00000007
> > r3 : 00000007 r2 : 00000001 r1 : 00000006 r0 : c0dc27c0
> > Flags: nZCv IRQs off FIQs off Mode SVC_32 ISA ARM Segment none
> ...
> > Code: e79e5103 e28c3001 e0833001 e1a04003 (e19440d5)
>
> This disassembles to:
>
> 0: e79e5103 ldr r5, [lr, r3, lsl #2]
> 4: e28c3001 add r3, ip, #1
> 8: e0833001 add r3, r3, r1
> c: e1a04003 mov r4, r3
> 10: e19440d5 ldrsb r4, [r4, r5]
>
> I don't have a similarly configured kernel, but here I have for the
> start of this function:
>
> 00000680 <__mod_node_page_state>:
> 680: e1a0c00d mov ip, sp
> 684: e92dd870 push {r4, r5, r6, fp, ip, lr, pc}
> 688: e24cb004 sub fp, ip, #4
> 68c: e590cc00 ldr ip, [r0, #3072] ; 0xc00
> 690: e1a0400d mov r4, sp
> 694: ee1d6f90 mrc 15, 0, r6, cr13, cr0, {4}
> 698: e08c5001 add r5, ip, r1
> 69c: e2855001 add r5, r5, #1
> 6a0: e1a03005 mov r3, r5
> 6a4: e196c0dc ldrsb ip, [r6, ip]
> 6a8: e19630d3 ldrsb r3, [r6, r3]
>
> r5 in your code is the equivalent of r6, r4 => r3, r3 -> r5.
> lr is the __per_cpu_offset array, so the first instruction is
> trying to load the percpu offset.
>
> The faulting code is:
>
> x = delta + __this_cpu_read(*p);
>
> specifically "__this_cpu_read(*p)".
>
> "ip" holds "pcp" from:
>
> struct per_cpu_nodestat __percpu *pcp = pgdat->per_cpu_nodestats;
>
> and you may notice that it's zero in the register dump. So,
> pgdat->per_cpu_nodestats is NULL here.
>
> This seems to be setup in setup_per_cpu_pageset(), which in the init
> order, happens way after mm_init() (which contains kmem_cache_init()).

Thanks for the analysis, Russell.

I think it's NULL because the slab allocation happens before even the
root_mem_cgroup is set up, and so root_mem_cgroup -> lruvec -> pgdat
gives us garbage.

Tony, Josef, since the patches are dropped from -next, could you test
the -mm tree at git://git.cmpxchg.org/linux-mmots.git and verify that
this patch below fixes the issue?

---