Re: [syzbot] [xfs?] WARNING in __queue_delayed_work

From: Darrick J. Wong
Date: Mon Apr 10 2023 - 21:17:06 EST


On Mon, Apr 10, 2023 at 08:20:22PM +0800, Hillf Danton wrote:
> On 07 Apr 2023 10:04:49 -0700
> > Hello,
> >
> > syzbot found the following issue on:
> >
> > HEAD commit: 7e364e56293b Linux 6.3-rc5
> > git tree: upstream
> > console output: https://syzkaller.appspot.com/x/log.txt?x=13241195c80000
> > kernel config: https://syzkaller.appspot.com/x/.config?x=e3b9dc6616d797bb
> > dashboard link: https://syzkaller.appspot.com/bug?extid=5ed016962f5137a09c7c
> > compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
> > userspace arch: i386
> >
> > Unfortunately, I don't have any reproducer for this issue yet.
> >
> > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > Reported-by: syzbot+5ed016962f5137a09c7c@xxxxxxxxxxxxxxxxxxxxxxxxx
> >
> > ------------[ cut here ]------------
> > WARNING: CPU: 1 PID: 102 at kernel/workqueue.c:1445 __queue_work+0xd44/0x1120 kernel/workqueue.c:1444
> > Modules linked in:
> > CPU: 1 PID: 102 Comm: kswapd0 Not tainted 6.3.0-rc5-syzkaller #0
> > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014
> > RIP: 0010:__queue_work+0xd44/0x1120 kernel/workqueue.c:1444

Gross. I just got one of these splats too, in:

__queue_work+0x3a2/0x4a0
call_timer_fn+0x24/0x120
__run_timers.part.0+0x170/0x280
run_timer_softirq+0x31/0x60
__do_softirq+0xf6/0x2fd
irq_exit_rcu+0xc5/0x110
sysvec_apic_timer_interrupt+0x8e/0xc0

I guess someone might have done:

Thread 0: xfs_inodegc_queue_all(mp);

Thread 1: <add inodegc work>
Thread 1: mod_delayed_work_on(cpu, mp->m_inodegc_wq, &gc->work, <nonzero>);

Thread 0: drain_workqueue(mp->m_inodegc_wq);

<Timer fires, splats, everyone halts>

But I can't really tell, the VM froze. Any suggestions on how to fix this?

--D

> > Code: e0 07 83 c0 03 38 d0 7c 09 84 d2 74 05 e8 74 0c 81 00 8b 5b 2c 31 ff 83 e3 20 89 de e8 c5 fb 2f 00 85 db 75 42 e8 6c ff 2f 00 <0f> 0b e9 3c f9 ff ff e8 60 ff 2f 00 0f 0b e9 ce f8 ff ff e8 54 ff
> > RSP: 0000:ffffc90000ce7638 EFLAGS: 00010093
> > RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> > RDX: ffff888015f73a80 RSI: ffffffff8152d854 RDI: 0000000000000005
> > RBP: 0000000000000002 R08: 0000000000000005 R09: 0000000000000000
> > R10: 0000000000000000 R11: 0000000000000000 R12: ffffe8ffffb03348
> > R13: ffff888078462000 R14: ffffe8ffffb03390 R15: ffff888078462000
> > FS: 0000000000000000(0000) GS:ffff88802ca80000(0000) knlGS:0000000000000000
> > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 0000000000cfa5bb CR3: 0000000025fde000 CR4: 0000000000150ee0
> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > Call Trace:
> > <TASK>
> > __queue_delayed_work+0x1c8/0x270 kernel/workqueue.c:1672
> > mod_delayed_work_on+0xe1/0x220 kernel/workqueue.c:1746
> > xfs_inodegc_shrinker_scan fs/xfs/xfs_icache.c:2212 [inline]
> > xfs_inodegc_shrinker_scan+0x250/0x4f0 fs/xfs/xfs_icache.c:2191
> > do_shrink_slab+0x428/0xaa0 mm/vmscan.c:853
> > shrink_slab+0x175/0x660 mm/vmscan.c:1013
> > shrink_one+0x502/0x810 mm/vmscan.c:5343
> > shrink_many mm/vmscan.c:5394 [inline]
> > lru_gen_shrink_node mm/vmscan.c:5511 [inline]
> > shrink_node+0x2064/0x35f0 mm/vmscan.c:6459
> > kswapd_shrink_node mm/vmscan.c:7262 [inline]
> > balance_pgdat+0xa02/0x1ac0 mm/vmscan.c:7452
> > kswapd+0x677/0xd60 mm/vmscan.c:7712
> > kthread+0x2e8/0x3a0 kernel/kthread.c:376
> > ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
> > </TASK>
>
> Looks like a valid race.
>
> xfs_inodegc_shrinker_scan() xfs_inodegc_stop()
> --- ---
> if (!xfs_is_inodegc_enabled(mp))
> return SHRINK_STOP;
> if (!xfs_clear_inodegc_enabled(mp))
> return;
> xfs_inodegc_queue_all(mp);
> drain_workqueue(mp->m_inodegc_wq);
> wq->flags |= __WQ_DRAINING;
> mod_delayed_work_on()