Re: [PATCH v2 5/12] percpu: Add {raw,this}_cpu_try_cmpxchg()

From: Konrad Dybcio
Date: Fri Jun 09 2023 - 12:20:32 EST




On 9.06.2023 18:13, Nathan Chancellor wrote:
> Hi Konrad,
>
> On Fri, Jun 09, 2023 at 06:10:38PM +0200, Konrad Dybcio wrote:
>>
>>
>> On 31.05.2023 15:08, Peter Zijlstra wrote:
>>> Add the try_cmpxchg() form to the per-cpu ops.
>>>
>>> Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
>>> ---
>> +CC Nathan, llvm list
>>
>> Hi all, this patch seems to break booting on Qualcomm ARM64 platforms
>> when compiled with clang (GCC works fine) for some reason..:
>>
>> next-20230605 - works
>> next-20230606 - doesn't
>>
>> grev -m 1 dc4e51fd9846 on next-20230606 - works again
>> b4 shazam <this_msgid> -P 1-4 - still works
>> b4 shazam <this_msgid> -P 5 - breaks
>>
>> Confirmed on at least Qualcomm QCM2290, SM8250.
>>
>> Checking the serial console, it hits a BUG_ON:
>>
>> [ 0.000000] ------------[ cut here ]------------
>> [ 0.000000] kernel BUG at mm/vmalloc.c:1638!
>> [ 0.000000] Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
>> [ 0.000000] Modules linked in:
>> [ 0.000000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted [snip]
>> [ 0.000000] Hardware name: Qualcomm Technologies, Inc. Robotics RB1 (DT)
>> [ 0.000000] pstate: 000000c5 (nzcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>> [ 0.000000] pc : alloc_vmap_area+0xafc/0xb08
>> [ 0.000000] lr : alloc_vmap_area+0x9e4/0xb08
>> [ 0.000000] sp : ffffa50137f53c20
>> [ 0.000000] x29: ffffa50137f53c60 x28: ffffa50137f30c18 x27: 0000000000000000
>> [ 0.000000] x26: 0000000000007fff x25: ffff800080000000 x24: 000000000000cfff
>> [ 0.000000] x23: ffffffffffff8000 x22: ffffa50137fef970 x21: fffffbfff0000000
>> [ 0.000000] x20: ffff022982003208 x19: ffff0229820031f8 x18: ffffa50137f64f70
>> [ 0.000000] x17: ffffa50137fef980 x16: ffffa501375e6d08 x15: 0000000000000001
>> [ 0.000000] x14: ffffa5013831e1a0 x13: ffffa50137f30c18 x12: 0000000000402dc2
>> [ 0.000000] x11: 0000000000000000 x10: ffff022982003018 x9 : ffffa5013831e188
>> [ 0.000000] x8 : ffffcb55ff003228 x7 : 0000000000000000 x6 : 0000000000000048
>> [ 0.000000] x5 : 0000000000000000 x4 : ffffa50137f53bd0 x3 : ffffa50136490000
>> [ 0.000000] x2 : 0000000000000001 x1 : ffffa5013831e190 x0 : ffff022982003208
>> [ 0.000000] Call trace:
>> [ 0.000000] alloc_vmap_area+0xafc/0xb08
>> [ 0.000000] __get_vm_area_node+0x108/0x1e8
>> [ 0.000000] __vmalloc_node_range+0x1fc/0x728
>> [ 0.000000] __vmalloc_node+0x5c/0x70
>> [ 0.000000] init_IRQ+0x90/0x11c
>> [ 0.000000] start_kernel+0x1ac/0x3bc
>> [ 0.000000] __primary_switched+0xc4/0xcc
>> [ 0.000000] Code: f000e300 91062000 943bd9ba 17ffff8f (d4210000)
>> [ 0.000000] ---[ end trace 0000000000000000 ]---
>> [ 0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
>>
>> Compiled with clang 15.0.7 from Arch repos, with
>> make ARCH=arm64 LLVM=1
>
> Thanks a lot for testing with LLVM, submitting this report, and doing a
> bisect.
No, thank *you* for making it even possible ;)

I sent a patch to fix this a couple of days ago and Peter pushed
> it to -tip today, so it should be in the next -next release:
>
> https://git.kernel.org/tip/093d9b240a1fa261ff8aeb7c7cc484dedacfda53
Amazing, I can boot the most recent next-20230609 with it again!

Konrad
>
> Cheers,
> Nathan