Re: [PATCH v2 5/12] percpu: Add {raw,this}_cpu_try_cmpxchg()

From: Nathan Chancellor
Date: Fri Jun 09 2023 - 12:13:55 EST


Hi Konrad,

On Fri, Jun 09, 2023 at 06:10:38PM +0200, Konrad Dybcio wrote:
>
>
> On 31.05.2023 15:08, Peter Zijlstra wrote:
> > Add the try_cmpxchg() form to the per-cpu ops.
> >
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
> > ---
> +CC Nathan, llvm list
>
> Hi all, this patch seems to break booting on Qualcomm ARM64 platforms
> when compiled with clang (GCC works fine) for some reason..:
>
> next-20230605 - works
> next-20230606 - doesn't
>
> grev -m 1 dc4e51fd9846 on next-20230606 - works again
> b4 shazam <this_msgid> -P 1-4 - still works
> b4 shazam <this_msgid> -P 5 - breaks
>
> Confirmed on at least Qualcomm QCM2290, SM8250.
>
> Checking the serial console, it hits a BUG_ON:
>
> [ 0.000000] ------------[ cut here ]------------
> [ 0.000000] kernel BUG at mm/vmalloc.c:1638!
> [ 0.000000] Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
> [ 0.000000] Modules linked in:
> [ 0.000000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted [snip]
> [ 0.000000] Hardware name: Qualcomm Technologies, Inc. Robotics RB1 (DT)
> [ 0.000000] pstate: 000000c5 (nzcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [ 0.000000] pc : alloc_vmap_area+0xafc/0xb08
> [ 0.000000] lr : alloc_vmap_area+0x9e4/0xb08
> [ 0.000000] sp : ffffa50137f53c20
> [ 0.000000] x29: ffffa50137f53c60 x28: ffffa50137f30c18 x27: 0000000000000000
> [ 0.000000] x26: 0000000000007fff x25: ffff800080000000 x24: 000000000000cfff
> [ 0.000000] x23: ffffffffffff8000 x22: ffffa50137fef970 x21: fffffbfff0000000
> [ 0.000000] x20: ffff022982003208 x19: ffff0229820031f8 x18: ffffa50137f64f70
> [ 0.000000] x17: ffffa50137fef980 x16: ffffa501375e6d08 x15: 0000000000000001
> [ 0.000000] x14: ffffa5013831e1a0 x13: ffffa50137f30c18 x12: 0000000000402dc2
> [ 0.000000] x11: 0000000000000000 x10: ffff022982003018 x9 : ffffa5013831e188
> [ 0.000000] x8 : ffffcb55ff003228 x7 : 0000000000000000 x6 : 0000000000000048
> [ 0.000000] x5 : 0000000000000000 x4 : ffffa50137f53bd0 x3 : ffffa50136490000
> [ 0.000000] x2 : 0000000000000001 x1 : ffffa5013831e190 x0 : ffff022982003208
> [ 0.000000] Call trace:
> [ 0.000000] alloc_vmap_area+0xafc/0xb08
> [ 0.000000] __get_vm_area_node+0x108/0x1e8
> [ 0.000000] __vmalloc_node_range+0x1fc/0x728
> [ 0.000000] __vmalloc_node+0x5c/0x70
> [ 0.000000] init_IRQ+0x90/0x11c
> [ 0.000000] start_kernel+0x1ac/0x3bc
> [ 0.000000] __primary_switched+0xc4/0xcc
> [ 0.000000] Code: f000e300 91062000 943bd9ba 17ffff8f (d4210000)
> [ 0.000000] ---[ end trace 0000000000000000 ]---
> [ 0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
>
> Compiled with clang 15.0.7 from Arch repos, with
> make ARCH=arm64 LLVM=1

Thanks a lot for testing with LLVM, submitting this report, and doing a
bisect. I sent a patch to fix this a couple of days ago and Peter pushed
it to -tip today, so it should be in the next -next release:

https://git.kernel.org/tip/093d9b240a1fa261ff8aeb7c7cc484dedacfda53

Cheers,
Nathan