Re: qemu sparc64 runtime crashes in -next

From: Guenter Roeck
Date: Wed Jun 14 2017 - 16:42:45 EST


On Wed, Jun 14, 2017 at 03:31:08PM -0400, David Miller wrote:
> From: Guenter Roeck <linux@xxxxxxxxxxxx>
> Date: Wed, 14 Jun 2017 03:13:54 -0700
>
> > Hi,
> >
> > my sparc qemu tests started failing with next-20170613.
> > Log output is not very helpful:
> >
> > Unhandled Exception 0x0000000000000028
> > PC = 0x00000000004620f4 NPC = 0x00000000004620f8
> > Stopping execution
> >
> > It looks like 0x00000000004620f4 is in init_tick_ops().
> >
> > Bisect points to commit 'sparc64: improve modularity tick options'.
> > Bisect log is attached.
> >
> > No idea if this is a qemu problem. If you think it is, anything to
> > help
> > tracking it down would be appreciated.
>
> Pavel, please look into this.
>
> It looks weird that the commit it bisects to would cause a problem.
> Maybe the change from __read_mostly to __cachelin_aligned causes the
> issue?
>
> Really weird...

Turns out tick_get_frequency() returns 0. The value is used as divisor
in clocksource_hz2mult().

Looking into it further, clock_tick is initialized much later.

[ 0.000000] clock_tick is 0
-> tick_get_frequency()
[ 0.039361] PROMLIB: Sun IEEE Boot Prom 'OBP 3.10.24 1999/01/01 01:01'
[ 0.041646] PROMLIB: Root node compatible: sun4u
[ 0.060500] Linux version 4.12.0-rc5-next-20170614+ (groeck@mars) (gcc version 4.6.3 (GCC) ) #5 SMP Wed Jun 14 13:40:01 PDT 2017
[ 0.893475] bootconsole [earlyprom0] enabled
[ 0.958658] ARCH: SUN4U
[ 1.265007] Ethernet address: 52:54:00:12:34:56
[ 1.340458] MM: PAGE_OFFSET is 0xfffff80000000000 (max_phys_bits == 40)
[ 1.405302] MM: VMALLOC [0x0000000100000000 --> 0x0000060000000000]
[ 1.468992] MM: VMEMMAP [0x0000060000000000 --> 0x00000c0000000000]
[ 3.349070] Kernel: Using 5 locked TLB entries for main kernel image.
[ 3.422093] Remapping the kernel...
[ 4.342159] done.
[ 136.231664] OF stdout device is: /pci@1fe,0/ebus@3/su
[ 136.298896] PROM: Built device tree with 60466 bytes of memory.
[ 136.458520] Top of RAM: 0x1fe80000, Total RAM: 0x1fe80000
[ 136.520487] Memory hole size: 0MB
[ 143.705871] Allocated 16384 bytes for kernel page tables.
[ 143.972916] Zone ranges:
[ 144.039046] Normal [mem 0x0000000000000000-0x000000001fe7ffff]
[ 144.118654] Movable zone start for each node
[ 144.180797] Early memory node ranges
[ 144.240870] node 0: [mem 0x0000000000000000-0x000000001fe7ffff]
[ 144.333686] Initmem setup node 0 [mem 0x0000000000000000-0x000000001fe7ffff]
[ 144.943918] Booting Linux...
[ 145.010966] CPU CAPS: [flush,stbar,swap,muldiv,v9,mul32,div32,v8plus]
[ 145.082225] CPU CAPS: [vis]
[ 145.581394] percpu: Embedded 12 pages/cpu @fffff8001f800000 s57024 r8192 d33088 u4194304
[ 145.949412] ###################### fill_in_one_cpu(): CPU 0 clock tick set to 100000000

That doesn't really take 145 seconds, though :-).

Guenter