Re: WARN_ON_ONCE() in process_one_work()?

From: Paul E. McKenney
Date: Thu Jun 15 2017 - 11:39:20 EST


On Wed, Jun 14, 2017 at 08:15:48AM -0700, Paul E. McKenney wrote:
> On Tue, Jun 13, 2017 at 03:31:03PM -0700, Paul E. McKenney wrote:
> > On Tue, Jun 13, 2017 at 04:58:37PM -0400, Tejun Heo wrote:
> > > Hello, Paul.
> > >
> > > On Fri, May 05, 2017 at 10:11:59AM -0700, Paul E. McKenney wrote:
> > > > Just following up... I have hit this bug a couple of times over the
> > > > past few days. Anything I can do to help?
> > >
> > > My apologies for dropping the ball on this. I've gone over the hot
> > > plug code in workqueue several times but can't really find how this
> > > would happen. Can you please apply the following patch and see what
> > > it says when the problem happens?
> >
> > I have fired it up, thank you!
> >
> > Last time I saw one failure in 21 hours of test runs, so I have kicked
> > of 42 one-hour test runs. Will see what happens tomorrow morning,
> > Pacific Time.
>
> And none of the 42 runs resulted in a workqueue splat. I will try again
> this evening, Pacific Time.
>
> Who knows, maybe your diagnostic patch is the fix. ;-)

And this time, we did get something! Here is the printk() output:

[ 2126.863410] XXX workfn=vmstat_update pool->cpu/flags=1/0x0 curcpu=2 online=0-2,7 active=0,2,7

Please see below for the full splat from dmesg.

Please let me know if you need additional email. My test ID is KSIC
2017.06.14-15:50:08/TREE07.14, just to help me find it in my large pile
of test results. ;-)

Thanx, Paul

------------------------------------------------------------------------

[ 2126.840625] ------------[ cut here ]------------
[ 2126.841373] WARNING: CPU: 2 PID: 6 at /home/paulmck/public_git/linux-rcu/kernel/workqueue.c:2048 process_one_work+0x5f/0x440
[ 2126.843042] Modules linked in:
[ 2126.843521] CPU: 2 PID: 6 Comm: mm_percpu_wq Not tainted 4.12.0-rc4+ #1
[ 2126.844493] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[ 2126.845898] task: ffff985c5e59bc00 task.stack: ffffb8be000fc000
[ 2126.846773] RIP: 0010:process_one_work+0x5f/0x440
[ 2126.847470] RSP: 0018:ffffb8be000ffe60 EFLAGS: 00010002
[ 2126.848240] RAX: ffff985c5e595200 RBX: ffff985c5e56e840 RCX: 0000000000000000
[ 2126.849284] RDX: 0000000000000001 RSI: ffff985c5e8952c0 RDI: ffff985c5e56e840
[ 2126.850337] RBP: ffffb8be000ffe90 R08: 0000000000000002 R09: 0000000000000000
[ 2126.851378] R10: ffffb8be000ffe18 R11: 0000000000000000 R12: ffff985c5e899c00
[ 2126.852410] R13: ffff985c5e89e800 R14: 0000000000000000 R15: ffff985c5e8952c0
[ 2126.853440] FS: 0000000000000000(0000) GS:ffff985c5e900000(0000) knlGS:0000000000000000
[ 2126.854650] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2126.855500] CR2: 00000000000000b0 CR3: 000000001dcfc000 CR4: 00000000000006e0
[ 2126.856558] Call Trace:
[ 2126.856944] rescuer_thread+0x207/0x350
[ 2126.857520] kthread+0x104/0x140
[ 2126.858024] ? process_one_work+0x440/0x440
[ 2126.858644] ? kthread_create_on_node+0x40/0x40
[ 2126.859343] ret_from_fork+0x22/0x30
[ 2126.859879] Code: e8 49 8b 45 08 44 8b b0 00 01 00 00 41 83 e6 20 f6 c1 04 44 89 75 d4 75 21 41 8b 54 24 04 65 44 8b 05 de 8a f9 74 41 39 d0 74 0f <0f> ff 80 3d b8 d5 ec 00 00 0f 84 92 03 00 00 48 ba eb 83 b5 80
[ 2126.862715] ---[ end trace df0aaa02bf2d28ee ]---
[ 2126.863410] XXX workfn=vmstat_update pool->cpu/flags=1/0x0 curcpu=2 online=0-2,7 active=0,2,7