Re: qemu sparc64 runtime crashes in -next

From: Pasha Tatashin
Date: Wed Jun 14 2017 - 16:54:19 EST


I think I know the problem, and working on a fix. Will send it out soon.

Thank you,
Pasha

On 06/14/2017 04:42 PM, Guenter Roeck wrote:
On Wed, Jun 14, 2017 at 03:31:08PM -0400, David Miller wrote:
From: Guenter Roeck <linux@xxxxxxxxxxxx>
Date: Wed, 14 Jun 2017 03:13:54 -0700

Hi,

my sparc qemu tests started failing with next-20170613.
Log output is not very helpful:

Unhandled Exception 0x0000000000000028
PC = 0x00000000004620f4 NPC = 0x00000000004620f8
Stopping execution

It looks like 0x00000000004620f4 is in init_tick_ops().

Bisect points to commit 'sparc64: improve modularity tick options'.
Bisect log is attached.

No idea if this is a qemu problem. If you think it is, anything to
help
tracking it down would be appreciated.

Pavel, please look into this.

It looks weird that the commit it bisects to would cause a problem.
Maybe the change from __read_mostly to __cachelin_aligned causes the
issue?

Really weird...

Turns out tick_get_frequency() returns 0. The value is used as divisor
in clocksource_hz2mult().

Looking into it further, clock_tick is initialized much later.

[ 0.000000] clock_tick is 0
-> tick_get_frequency()
[ 0.039361] PROMLIB: Sun IEEE Boot Prom 'OBP 3.10.24 1999/01/01 01:01'
[ 0.041646] PROMLIB: Root node compatible: sun4u
[ 0.060500] Linux version 4.12.0-rc5-next-20170614+ (groeck@mars) (gcc version 4.6.3 (GCC) ) #5 SMP Wed Jun 14 13:40:01 PDT 2017
[ 0.893475] bootconsole [earlyprom0] enabled
[ 0.958658] ARCH: SUN4U
[ 1.265007] Ethernet address: 52:54:00:12:34:56
[ 1.340458] MM: PAGE_OFFSET is 0xfffff80000000000 (max_phys_bits == 40)
[ 1.405302] MM: VMALLOC [0x0000000100000000 --> 0x0000060000000000]
[ 1.468992] MM: VMEMMAP [0x0000060000000000 --> 0x00000c0000000000]
[ 3.349070] Kernel: Using 5 locked TLB entries for main kernel image.
[ 3.422093] Remapping the kernel...
[ 4.342159] done.
[ 136.231664] OF stdout device is: /pci@1fe,0/ebus@3/su
[ 136.298896] PROM: Built device tree with 60466 bytes of memory.
[ 136.458520] Top of RAM: 0x1fe80000, Total RAM: 0x1fe80000
[ 136.520487] Memory hole size: 0MB
[ 143.705871] Allocated 16384 bytes for kernel page tables.
[ 143.972916] Zone ranges:
[ 144.039046] Normal [mem 0x0000000000000000-0x000000001fe7ffff]
[ 144.118654] Movable zone start for each node
[ 144.180797] Early memory node ranges
[ 144.240870] node 0: [mem 0x0000000000000000-0x000000001fe7ffff]
[ 144.333686] Initmem setup node 0 [mem 0x0000000000000000-0x000000001fe7ffff]
[ 144.943918] Booting Linux...
[ 145.010966] CPU CAPS: [flush,stbar,swap,muldiv,v9,mul32,div32,v8plus]
[ 145.082225] CPU CAPS: [vis]
[ 145.581394] percpu: Embedded 12 pages/cpu @fffff8001f800000 s57024 r8192 d33088 u4194304
[ 145.949412] ###################### fill_in_one_cpu(): CPU 0 clock tick set to 100000000

That doesn't really take 145 seconds, though :-).

Guenter