Fail to boot KCSAN-enabled kernel (Kernel panic - not syncing: Fatal exception, Unrecoverable FP Unavailable Exception 800 at c0000000022cafe0) on a PowerMac G5, kernel 6.6.1

From: Erhard Furtner
Date: Mon Nov 13 2023 - 18:38:01 EST


Greetings!

Both my PowerMac G5 and my Talos II (running a BE kernel+system) fail to boot a KCSAN-enabled kernel. Same kernel without KSCAN enabled boots just fine.

I tried to dig a little deeper with a stripped down .config on the G5 with CONFIG_KCSAN=y, CONFIG_KCSAN_STRICT=y and finally got some output via CONFIG_PPC_EARLY_DEBUG_G5=y. The machine freezes before output via serial console or netconsole is available so I had to take screen shots and transcribed them.

On EARLY_DEBUG_G5 the last output (there's some before but it gets overwritten) shown on the screen is:

[c0000000022ebd90] [c0000000000f95c4] __cpuhp_setup_state_cpuslocked+0x1b4/0x590
[c0000000022ebd60] [c0000000000f9ab8] __cpuhp_setup_state+0x118/0x2e8
[c0000000022ebef0] [c000000002023dc8] poking_init+0x3c/0x90
[c0000000022ebf10] [c0000000020059a8] start_kernel+0x6a0/0x99c
[c0000000022ebfe0] [c00000000000cb48] start_here_common+0x1c/0x20
Code: 7fff9830 7ad6f082 3bffffff 7936f00e 7fffa838 7bff1f24 7e96fa15 41820218 7e83a378 482eed09 60000000 <7eb6f82a> 2c350000 41820230 39200001
---[ end trace 0000000000000000 ]---

Kernel panic - not syncing: Fatal exception
Unrecoverable FP Unavailable Exception 800 at c0000000022cafe0
Oops: Unrecoverable FP Unavailable Exception, sig: 6 [#2]
BE PAGE_SIZE=4K MMU=Hash SMP NR_CPUS=2 PowerMac
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Tainted: G D W
Hardware name: PowerMac11,2 PPC97OMP 0x440101 PowerMac
NIP: c0000000022cafe0 LR: c000000000046178 CTR: c0000000022cafe0
REGS: c0000000022eb290 TRAP: 0800 Tainted: G D W ()
MSR: 9000000000001032 <SF,HV,ME,IR,DR,RI> CR: 84048882 XER: 00000000
IRQMASK: 3
GPR00: 0000000000000000 c0000000022eb530 c0000000016cb100 fffffffffffffffe
GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR08: 0000000000000000 0000000000000000 0000000000000000 00000000023404df
GPR12: c0000000022cafe0 c000000002388000 000000000014aa88 0000000000000000
GPR16: 00000000ff9aac70 c0000000003593e0 c0000000003584a0 0000000000000009
GPR20: f886000f7ce74a00 0000000000000001 0000000000000000 0000000000000000
GPR24: c0000000022f8ab8 c0000000023404d0 c0000000022cbac0 fffffffffffffffe
GPR28: c0000000023484d8 00000000000f4240 c0000000023404cc c0000000023404c0
NIP [c0000000022cafe0] init_task+0x9e0/0xf00
LR [0000000000046170] __smp_send_nmi_ipi+0x4c0/0x610
Call Trace:
[c0000000022eb530] [0000000000046170] __smp_send_nmi_ipi+0x480/0x610 (unreliable)
[c0000000022eb5c0] [0000000000046b60] smp_send_stop+0x30/0x60
[c0000000022eb5e0] [00000000000f5bb0] panic+0x274/0x554
[c0000000022eb6a0] [000000000002313c] die+0x4bc/0x4c0
[c0000000022eb760] [0000000000062e98] bad_page_fault+0x200/0x2c0
[c0000000022eb7f0] [0000000000063148] do_bad_segment_interrupt+0x58/0xe0
[c0000000022eb828] [0000000000007afc] data_access_slb_common_virt+0x19c/0x1a0
--- interrupt: 380 at hash__map_kernel_page+0x178/0x460
NIP: c000000000069c48 LR: c000000000069c44 CTR: 0000000000000000
REGS: c0000000022eb850 TRAP: 0380 Tainted: G D W ()
MSR: 9000000000001032 <SF,HV,ME,IR,DR,RI> CR: 44042882 XER: 00000000
IRQMASK: 1
GPR00: 0000000000000000 c0000000022ebaf0 c0000000016cb100 0000000000000000
GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR12: 0000000000000000 c000000002388000 000000000014aa88 0000000000000000
GPR16: 00000000ff9aac70 c0000000003593e0 c0000000003584a0 0000000000000009
GPR20: f886000f7ce74a00 0000000c000cd000 f886000f7ce74a88 0000000000000007
GPR24: 0000000000000009 c0000000023429e0 c0000000023429e8 c0000000022d8d80
GPR28: 800000000000018e 0000000002322000 c0000000023404cc 0000000000000000
NIP [c000000000069c48] hash__map_kernel_page+0x178/0x460
LR [c000000000069c44] hash__map_kernel_page+0x174/0x460
--- interrupt: 380
[c0000000022ebaf0] [c0000000022ebb80] init_stack+0x3b80/0x4000 (unreliable)
[c0000000022ebbd0] [c000000000077d08] text_area_cpu_up+0x78/0x490
[c0000000022ebc80] [c0000000000f6608] cpuhp_invoke_callback+0x218/0x490
[c0000000022ebcf0] [c0000000000f8e28] cpuhp_issue_cal1+0x4c8/0x4f0
[c0000000022ebd90] [c0000000000f95c4] __cpuhp_setup_state_cpuslocked+0x1b4/0x590
[c0000000022ebd60] [c0000000000f9ab8] __cpuhp_setup_state+0x118/0x2e8
[c0000000022ebef0] [c000000002023dc8] poking_init+0x3c/0x90
[c0000000022ebf10] [c0000000020059a8] start_kernel+0x6a0/0x99c
[c0000000022ebfe0] [c00000000000cb48] start_here_common+0x1c/0x20
Code: c0000000 0006f230 00000000 00000000 00000000 c0000004 7fe06608 c0000000 023432a8 00000000 00000001 <c0000000> 022cb020 00000000 00000000
---[ end trace 0000000000000000 ]---

Kernel panic - not syncing: Fatal exception


When I additionally enable CONFIG_KCSAN_SELFTEST=y the machine freezes even earlier and I only get this:

ioremap() called early from iommu_init_early_dart+0x29c/0xb90. Use early_ioremap() instead
DART table allocated at: (____ptrval____)
DART IOMMU initialized for U4 type chipset
Hardware name: PowerMac11,2 PPC970MP 0x440101 PowerMac
CPU maps initialized for 1 thread per core
(thread shift is 0)
Allocated 2320 bytes for 2 pacas
-----------------------------------------------------
phys_mem_size = 0x400000000
dcache_bsize = 0x80
icache_bsize = 0x80
cpu_features = 0x00000100900c218a
possible = 0x001ffbebfbffb18f
always = 0x0000000000000180
cpu_user_features = 0xdc080000 0x00000000
mmu_features = 0x0c008001
firmware_features = 0x0000000000000000
vmalloc start = 0xc0003d0000000000
IO start = 0xc0003e0000000000
vmemmap start = 0xc0003f0000000000
hash-mmu: ppc64_pft_size = 0x0
hash-mmu: htab_hash_mask = 0x1fffff
-----------------------------------------------------
SMU: Driver 0.7 (c) 2005 Benjamin Herrenschmidt, IBM Corp.
ioremap() called early from smu_init+0x5dc/0x840. Use early_ioremap() instead
ioremap() called early from pmac_nvram_init+0x358/0xa3c. Use early_ioremap() instead
nvram: Checking bank 0...
nvram: gen0=1642, gen1=1641
nvram: Active bank is: 0
nvram: OF partition at 0x410
nvram: XP partition at 0x1020
nvram: NR partition at 0x1120
barrier-nospec: using ORI speculation barrier
barrier-nospec: patched 269 locations
Top of RAM: 0x480000000, Total RAM: 0x400000000
Memory hole size: 2048MB
Zone ranges:
Normal [mem 0x0000000000000000-0x000000047fffffff]
Movable zone start for each node
Early memory node ranges
node 0: [mem 0x0000000000000000-0x000000007fffffff]
node 0: [mem 0x0000000100000000-0x000000047fffffff]
Initmem setup node 0 [mem 0x0000000000000000-0x000000047fffffff]
On node 0, zone Normal: 524288 pages in unavailable ranges
percpu: Embedded 20 pages/cpu s43176 r0 d38744 u524288
pcpu-alloc: s43176 r0 d38744 u524288 alloc=1*1048576
pcpu-alloc: [0] 0 1


Kernel .config attached.

In contrast to the G5 and Talos II KCSAN works ok on my PowerMac G4 DP. So this is probably ppc64 specific?

Regards,
Erhard

Attachment: config_661-van_g5
Description: Binary data