Re: [PATCH v2] mm: slab, with same context require fs_reclaim lock

From: Steven Rostedt
Date: Wed Oct 12 2022 - 07:23:28 EST


On Tue, 27 Sep 2022 15:11:34 +0800
eadavis@xxxxxxxx wrote:

> From: Edward Adam Davis <eadavis@xxxxxxxx>
>
> 1. ENABLE_SOFTIRQ held the fs_reclaim lock:
> {SOFTIRQ-ON-W} state was registered at:
> lock_acquire kernel/locking/lockdep.c:5666 [inline]
> lock_acquire+0x1ab/0x570 kernel/locking/lockdep.c:5631
> __fs_reclaim_acquire mm/page_alloc.c:4674 [inline]
> fs_reclaim_acquire+0x115/0x160 mm/page_alloc.c:4688
> might_alloc include/linux/sched/mm.h:271 [inline]
> slab_pre_alloc_hook mm/slab.h:700 [inline]
> slab_alloc mm/slab.c:3278 [inline]
> kmem_cache_alloc_trace+0x38/0x460 mm/slab.c:3557
> kmalloc include/linux/slab.h:600 [inline]
> kzalloc include/linux/slab.h:733 [inline]
> alloc_workqueue_attrs+0x39/0xc0 kernel/workqueue.c:3394
> wq_numa_init kernel/workqueue.c:5964 [inline]
> workqueue_init+0x12f/0x8ae kernel/workqueue.c:6091
> kernel_init_freeable+0x3fb/0x73a init/main.c:1607
> kernel_init+0x1a/0x1d0 init/main.c:1512
> ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
>
> 2. IN_SOFTIRQ require the fs_reclaim lock:
> __dump_stack lib/dump_stack.c:88 [inline]
> dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
> print_usage_bug kernel/locking/lockdep.c:3961 [inline]
> valid_state kernel/locking/lockdep.c:3973 [inline]
> mark_lock_irq kernel/locking/lockdep.c:4176 [inline]
> mark_lock.part.0.cold+0x18/0xd8 kernel/locking/lockdep.c:4632
> mark_lock kernel/locking/lockdep.c:4596 [inline]
> mark_usage kernel/locking/lockdep.c:4527 [inline]
> __lock_acquire+0x11d9/0x56d0 kernel/locking/lockdep.c:5007
> lock_acquire kernel/locking/lockdep.c:5666 [inline]
> lock_acquire+0x1ab/0x570 kernel/locking/lockdep.c:5631
> __fs_reclaim_acquire mm/page_alloc.c:4674 [inline]
> fs_reclaim_acquire+0x115/0x160 mm/page_alloc.c:4688
> might_alloc include/linux/sched/mm.h:271 [inline]
> slab_pre_alloc_hook mm/slab.h:700 [inline]
> slab_alloc mm/slab.c:3278 [inline]
>
> move slab_pre_alloc_hook() to irq context, confirm the context to IN_SOFTIRQ.
>
> Link: https://syzkaller.appspot.com/bug?extid=dfcc5f4da15868df7d4d
> Reported-by: syzbot+dfcc5f4da15868df7d4d@xxxxxxxxxxxxxxxxxxxxxxxxx
> Signed-off-by: Edward Adam Davis <eadavis@xxxxxxxx>
> Changes in v2:
> comments update.
> ---
> mm/slab.c | 10 +++++++---
> 1 file changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/mm/slab.c b/mm/slab.c
> index 10e96137b44f..29d49d1b1e96 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -3275,15 +3275,19 @@ slab_alloc(struct kmem_cache *cachep, struct list_lru *lru, gfp_t flags,
> bool init = false;
>
> flags &= gfp_allowed_mask;
> + local_irq_save(save_flags);

Please do not do this. Open coding interrupt disabling due to locking
issues is not the solution. You need to make the locks themselves
disable interrupts if need be. This breaks PREEMPT_RT, and creates a
"big kernel lock" situation where there's random interrupts being
disabled for no apparent reason.

-- Steve


> cachep = slab_pre_alloc_hook(cachep, lru, &objcg, 1, flags);
> - if (unlikely(!cachep))
> + if (unlikely(!cachep)) {
> + local_irq_restore(save_flags);
> return NULL;
> + }
>
> objp = kfence_alloc(cachep, orig_size, flags);
> - if (unlikely(objp))
> + if (unlikely(objp)) {
> + local_irq_restore(save_flags);
> goto out;
> + }
>
> - local_irq_save(save_flags);
> objp = __do_cache_alloc(cachep, flags);
> local_irq_restore(save_flags);
> objp = cache_alloc_debugcheck_after(cachep, flags, objp, caller);