Re: [PATCH] kernfs: convert kernfs_idr_lock to an irq safe raw spinlock

From: Andrea Righi
Date: Tue Jan 09 2024 - 12:05:30 EST


On Tue, Jan 09, 2024 at 05:35:36PM +0100, Geert Uytterhoeven wrote:
..
> Thanks for your patch, which is now commit c312828c37a72fe2
> ("kernfs: convert kernfs_idr_lock to an irq safe raw spinlock")
> in driver-core/driver-core-next.
>
> Unfortunately this interacts badly with commit 4eff7d62abdeb293 ("Revert
> "mm/kmemleak: move the initialisation of object to __link_object"")
> in v6.7-rc5.
>
> driver-core/driver-core-next is still at v6.7-rc3, so it does not
> yet have commit 4eff7d62abdeb293, and thus still triggers:
>
> =============================
> [ BUG: Invalid wait context ]
> 6.7.0-rc3-kzm9g-00052-gc312828c37a7 #576 Not tainted
> -----------------------------
> swapper/0 is trying to lock:
> c0c6e3c4 (&zone->lock){....}-{3:3}, at: __rmqueue_pcplist+0x358/0x3c8
> other info that might help us debug this:
> context-{5:5}
> 3 locks held by swapper/0:
> #0: c0bf35a0 (slab_mutex){....}-{4:4}, at:
> kmem_cache_create_usercopy+0xc8/0x2d0
> #1: c0bfab0c (kmemleak_lock){....}-{2:2}, at: __create_object+0x2c/0x7c
> #2: dfbc8c90 (&pcp->lock){....}-{3:3}, at:
> get_page_from_freelist+0x1a0/0x684
> stack backtrace:
> CPU: 0 PID: 0 Comm: swapper Not tainted
> 6.7.0-rc3-kzm9g-00052-gc312828c37a7 #576
> Hardware name: Generic SH73A0 (Flattened Device Tree)
> unwind_backtrace from show_stack+0x10/0x14
> show_stack from dump_stack_lvl+0x68/0x90
> dump_stack_lvl from __lock_acquire+0x3cc/0x168c
> __lock_acquire from lock_acquire+0x274/0x30c
> lock_acquire from _raw_spin_lock_irqsave+0x50/0x64
> _raw_spin_lock_irqsave from __rmqueue_pcplist+0x358/0x3c8
> __rmqueue_pcplist from get_page_from_freelist+0x3bc/0x684
> get_page_from_freelist from __alloc_pages+0xe8/0xad8
> __alloc_pages from __stack_depot_save+0x160/0x398
> __stack_depot_save from set_track_prepare+0x48/0x74
> set_track_prepare from __link_object+0xac/0x204
> __link_object from __create_object+0x48/0x7c
> __create_object from kmemleak_alloc+0x2c/0x38
> kmemleak_alloc from slab_post_alloc_hook.constprop.0+0x9c/0xac
> slab_post_alloc_hook.constprop.0 from kmem_cache_alloc+0xcc/0x148
> kmem_cache_alloc from kmem_cache_create_usercopy+0x1c4/0x2d0
> kmem_cache_create_usercopy from kmem_cache_create+0x1c/0x24
> kmem_cache_create from kmemleak_init+0x58/0xfc
> kmemleak_init from mm_core_init+0x244/0x2c8
> mm_core_init from start_kernel+0x274/0x528
> start_kernel from 0x0
>
> After merging driver-core/driver-core-next into a tree based on
> v6.7-rc5, or after cherry-picking commit 4eff7d62abdeb293 into
> driver-core/driver-core-next, the above BUG is gone, but a different
> one appears:
>
> =============================
> [ BUG: Invalid wait context ]
> 6.7.0-rc5-kzm9g-00251-g655022a45b1c #578 Not tainted
> -----------------------------
> swapper/0/0 is trying to lock:
> dfbcd488 (&c->lock){....}-{3:3}, at: local_lock_acquire+0x0/0xa4
> other info that might help us debug this:
> context-{5:5}
> 2 locks held by swapper/0/0:
> #0: dfbc9c60 (lock){+.+.}-{3:3}, at: local_lock_acquire+0x0/0xa4
> #1: c0c012a8 (kernfs_idr_lock){....}-{2:2}, at:
> __kernfs_new_node.constprop.0+0x68/0x258
> stack backtrace:
> CPU: 0 PID: 0 Comm: swapper/0 Not tainted
> 6.7.0-rc5-kzm9g-00251-g655022a45b1c #578
> Hardware name: Generic SH73A0 (Flattened Device Tree)
> unwind_backtrace from show_stack+0x10/0x14
> show_stack from dump_stack_lvl+0x68/0x90
> dump_stack_lvl from __lock_acquire+0x3cc/0x168c
> __lock_acquire from lock_acquire+0x274/0x30c
> lock_acquire from local_lock_acquire+0x28/0xa4
> local_lock_acquire from ___slab_alloc+0x234/0x8a8
> ___slab_alloc from __slab_alloc.constprop.0+0x30/0x44
> __slab_alloc.constprop.0 from kmem_cache_alloc+0x7c/0x148
> kmem_cache_alloc from radix_tree_node_alloc.constprop.0+0x44/0xdc
> radix_tree_node_alloc.constprop.0 from idr_get_free+0x110/0x2b8
> idr_get_free from idr_alloc_u32+0x9c/0x108
> idr_alloc_u32 from idr_alloc_cyclic+0x50/0xb8
> idr_alloc_cyclic from __kernfs_new_node.constprop.0+0x88/0x258
> __kernfs_new_node.constprop.0 from kernfs_create_root+0xbc/0x154
> kernfs_create_root from sysfs_init+0x18/0x5c
> sysfs_init from mnt_init+0xc4/0x220
> mnt_init from vfs_caches_init+0x6c/0x88
> vfs_caches_init from start_kernel+0x474/0x528
> start_kernel from 0x0
>
> Reverting commit c312828c37a72fe2 fixes that.
> I have seen this issue on several Renesas arm32 and arm64 platforms.
>
> Also, I am wondering if the issue fixed by commit c312828c37a72fe2
> can still be reproduced on v6.7-rc5 or later?

Yep, I can still reproduce it (this is with v6.7):

[ 3.082273]
[ 3.082822] =====================================================
[ 3.084543] WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected
[ 3.086252] 6.7.0-virtme #4 Not tainted
[ 3.087002] -----------------------------------------------------
[ 3.087385] swapper/5/0 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
[ 3.087768] ffffffff8f9c5378 (kernfs_idr_lock){+.+.}-{2:2}, at: kernfs_find_and_get_node_by_id+0x1d/0x80
[ 3.088335]
[ 3.088335] and this task is already holding:
[ 3.088685] ffff8a83becbf758 (&rq->__lock){-.-.}-{2:2}, at: __schedule+0xda/0xef0
[ 3.089128] which would create a new lock dependency:
[ 3.089435] (&rq->__lock){-.-.}-{2:2} -> (kernfs_idr_lock){+.+.}-{2:2}
[ 3.089827]
[ 3.089827] but this new dependency connects a HARDIRQ-irq-safe lock:
[ 3.090296] (&rq->__lock){-.-.}-{2:2}
[ 3.090297]
[ 3.090297] ... which became HARDIRQ-irq-safe at:
[ 3.090885] lock_acquire+0xcb/0x2c0
[ 3.091108] _raw_spin_lock_nested+0x2e/0x40
[ 3.091374] scheduler_tick+0x5b/0x3d0
[ 3.091607] update_process_times+0x9c/0xb0
[ 3.091867] tick_periodic+0x27/0xe0
[ 3.092089] tick_handle_periodic+0x24/0x70
[ 3.092351] timer_interrupt+0x18/0x30
[ 3.092585] __handle_irq_event_percpu+0x8b/0x240
[ 3.092865] handle_irq_event+0x38/0x80
[ 3.093095] handle_level_irq+0x90/0x170
[ 3.093340] __common_interrupt+0x4a/0xf0
[ 3.093586] common_interrupt+0x83/0xa0
[ 3.093820] asm_common_interrupt+0x26/0x40
[ 3.094080] _raw_spin_unlock_irqrestore+0x36/0x70
[ 3.094381] __setup_irq+0x441/0x6a0
[ 3.094602] request_threaded_irq+0xe5/0x190
[ 3.094862] hpet_time_init+0x3a/0x60
[ 3.095090] x86_late_time_init+0x1b/0x40
[ 3.095344] start_kernel+0x53a/0x6a0
[ 3.095569] x86_64_start_reservations+0x18/0x30
[ 3.095849] x86_64_start_kernel+0xc5/0xe0
[ 3.096097] secondary_startup_64_no_verify+0x178/0x17b
[ 3.096426]
[ 3.096426] to a HARDIRQ-irq-unsafe lock:
[ 3.096749] (kernfs_idr_lock){+.+.}-{2:2}
[ 3.096751]
[ 3.096751] ... which became HARDIRQ-irq-unsafe at:
[ 3.097372] ...
[ 3.097372] lock_acquire+0xcb/0x2c0
[ 3.097701] _raw_spin_lock+0x30/0x40
[ 3.097925] __kernfs_new_node.isra.0+0x83/0x280
[ 3.098205] kernfs_create_root+0xf6/0x1d0
[ 3.098463] sysfs_init+0x1b/0x70
[ 3.098670] mnt_init+0xd9/0x2a0
[ 3.098872] vfs_caches_init+0xcf/0xe0
[ 3.099105] start_kernel+0x58a/0x6a0
[ 3.099334] x86_64_start_reservations+0x18/0x30
[ 3.099613] x86_64_start_kernel+0xc5/0xe0
[ 3.099862] secondary_startup_64_no_verify+0x178/0x17b
[ 3.100175]
[ 3.100175] other info that might help us debug this:
[ 3.100175]
[ 3.100652] Possible interrupt unsafe locking scenario:
[ 3.100652]
[ 3.101049] CPU0 CPU1
[ 3.101323] ---- ----
[ 3.101641] lock(kernfs_idr_lock);
[ 3.101909] local_irq_disable();
[ 3.102473] lock(&rq->__lock);
[ 3.102854] lock(kernfs_idr_lock);
[ 3.103171] <Interrupt>
[ 3.103308] lock(&rq->__lock);
[ 3.103492]
[ 3.103492] *** DEADLOCK ***

I'm wondering if using a regular spinlock instead of a raw spinlock
could be a reasonable compromise.

We have a GFP_ATOMIC allocation in __kernfs_new_node():

raw_spin_lock_irqsave(&kernfs_idr_lock, irqflags);
ret = idr_alloc_cyclic(&root->ino_idr, kn, 1, 0, GFP_ATOMIC);
...
raw_spin_unlock_irqrestore(&kernfs_idr_lock, irqflags);

That should become valid using a
spin_lock_irqsave/spin_unlock_irqrestore(), right?

Thanks,
-Andrea

>
> Thanks!
>
> Gr{oetje,eeting}s,
>
> Geert
>
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@xxxxxxxxxxxxxx
>
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like that.
> -- Linus Torvalds