Re: [PATCH for-3.20-fixes] workqueue: fix hang involving racing cancel[_delayed]_work_sync()'s for PREEMPT_NONE

From: Tomeu Vizoso
Date: Tue Mar 03 2015 - 05:01:15 EST


On 2 March 2015 at 17:21, Tejun Heo <tj@xxxxxxxxxx> wrote:
> On Mon, Mar 02, 2015 at 01:26:15PM +0100, Jesper Nilsson wrote:
>> On Mon, Feb 09, 2015 at 05:15:27PM +0100, Tejun Heo wrote:
>> > Hello,
>>
>> Hi!
>>
>> > This patch removes the possible hang by updating __cancel_work_timer()
>> > to explicitly wait for clearing of CANCELING rather than invoking
>> > flush_work() after try_to_grab_pending() fails with -ENOENT. The
>> > explicit wait uses the matching bit waitqueue for the CANCELING bit.
>> >
>> > Link: http://lkml.kernel.org/g/20150206171156.GA8942@xxxxxxxx
>> >
>> > Signed-off-by: Tejun Heo <tj@xxxxxxxxxx>
>> > Reported-by: Rabin Vincent <rabin.vincent@xxxxxxxx>
>> > Cc: stable@xxxxxxxxxxxxxxx
>>
>> What's the status on this patch, it's not in 4.0-rc1 at least?
>> Is it queued for the 3.18 stable branch?
>
> Sorry about the delay. Applied to wq/for-4.0-fixes. Will push out in
> a week or so.

Hello,

I'm getting this during almost every boot this morning, after rebasing
on today's linux-next. Reverting this patch makes the issue go away.
This has been tested on a Tegra 124-based Acer Chromebook 13, running
a Debian derivative (I mention this because I see that in some test
farms the boot succeeded on similar hw, but they probably have a
simpler userspace, eg !systemd).

[ 7.358239] Unable to handle kernel NULL pointer dereference at
virtual address 00000000
[ 7.368225] pgd = c0204000
[ 7.372693] [00000000] *pgd=00000000
[ 7.378031] Internal error: Oops: 17 [#1] SMP ARM
[ 7.384486] Modules linked in: ipv6
[ 7.389738] CPU: 1 PID: 110 Comm: kworker/1:2 Not tainted
4.0.0-rc1-next-20150303ccu #568
[ 7.399687] Hardware name: NVIDIA Tegra SoC (Flattened Device Tree)
[ 7.407736] Workqueue: cgroup_destroy css_free_work_fn
[ 7.414645] task: ecfe8e40 ti: eb9e0000 task.ti: eb9e0000
[ 7.421803] PC is at wake_bit_function+0x18/0x6c
[ 7.428168] LR is at __wake_up_common+0x5c/0x90
[ 7.434433] pc : [<c028f54c>] lr : [<c028ed3c>] psr: 200f0093
[ 7.434433] sp : eb9e1df0 ip : eb9e1e08 fp : eb9e1e04
[ 7.449379] r10: 00000001 r9 : 00000003 r8 : 00000000
[ 7.456331] r7 : 00000000 r6 : ee837a28 r5 : 00000001 r4 : eeda8ee0
[ 7.464580] r3 : 00000000 r2 : 00000000 r1 : 00000003 r0 : ebb19df4
[ 7.472825] Flags: nzCv IRQs off FIQs on Mode SVC_32 ISA ARM
Segment kernel
[ 7.481952] Control: 10c5387d Table: abb9406a DAC: 00000015
[ 7.489427] Process kworker/1:2 (pid: 110, stack limit = 0xeb9e0220)
[ 7.497524] Stack: (0xeb9e1df0 to 0xeb9e2000)
[ 7.503616] 1de0: ee837a1c
00000001 eb9e1e34 eb9e1e08
[ 7.513549] 1e00: c028ed3c c028f540 00000000 ee837a24 800f0013
00000000 00000001 00000003
[ 7.523490] 1e20: 00000000 00000000 eb9e1e64 eb9e1e38 c028ef88
c028ecec 00000000 c026d274
[ 7.533438] 1e40: eb9e1e74 00000011 eb932fb4 00000000 ee837a24
c028f4f0 eb9e1eac eb9e1e68
[ 7.543390] 1e60: c026fdfc c028ef4c 600f0013 eb9e1e88 eb9e1e88
eb9e1e74 eb9e1e74 00000006
[ 7.553357] 1e80: 00000000 ebaf2a80 eb932f88 eb932f00 eb932f90
c11f5638 00000000 ee7f7005
[ 7.563329] 1ea0: eb9e1ebc eb9e1eb0 c026fedc c026fd20 eb9e1ee4
eb9e1ec0 c02ce3bc c026fecc
[ 7.573310] 1ec0: eb932f50 ecf81080 c11b0338 ee7f33c0 ee7f7000
00000000 eb9e1f24 eb9e1ee8
[ 7.583296] 1ee0: c026f4d0 c02ce254 ee7f33c0 ee7f33d4 eb9e0000
00000000 ecf81080 ee7f33c0
[ 7.593289] 1f00: ecf81098 ee7f33d4 eb9e0000 00000008 ecf81080
ee7f33c0 eb9e1f5c eb9e1f28
[ 7.603295] 1f20: c026ff6c c026f380 c026ff18 c10ae100 00000000
00000000 eb9c0180 ecf81080
[ 7.613305] 1f40: c026ff18 00000000 00000000 00000000 eb9e1fac
eb9e1f60 c02749d8 c026ff24
[ 7.623321] 1f60: 00000000 00000000 00000000 ecf81080 00000000
00000000 eb9e1f78 eb9e1f78
[ 7.633337] 1f80: 00000000 00000000 eb9e1f88 eb9e1f88 eb9c0180
c02748ec 00000000 00000000
[ 7.643330] 1fa0: 00000000 eb9e1fb0 c0210aa0 c02748f8 00000000
00000000 00000000 00000000
[ 7.653299] 1fc0: 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000
[ 7.663248] 1fe0: 00000000 00000000 00000000 00000000 00000013
00000000 00000000 00000000
[ 7.673183] [<c028f54c>] (wake_bit_function) from [<c028ed3c>]
(__wake_up_common+0x5c/0x90)
[ 7.683300] [<c028ed3c>] (__wake_up_common) from [<c028ef88>]
(__wake_up+0x48/0x5c)
[ 7.692744] [<c028ef88>] (__wake_up) from [<c026fdfc>]
(__cancel_work_timer+0xe8/0x1ac)
[ 7.702533] [<c026fdfc>] (__cancel_work_timer) from [<c026fedc>]
(cancel_work_sync+0x1c/0x20)
[ 7.712854] [<c026fedc>] (cancel_work_sync) from [<c02ce3bc>]
(css_free_work_fn+0x174/0x2ec)
[ 7.723099] [<c02ce3bc>] (css_free_work_fn) from [<c026f4d0>]
(process_one_work+0x15c/0x3dc)
[ 7.733339] [<c026f4d0>] (process_one_work) from [<c026ff6c>]
(worker_thread+0x54/0x4e8)
[ 7.743224] [<c026ff6c>] (worker_thread) from [<c02749d8>]
(kthread+0xec/0x104)
[ 7.752339] [<c02749d8>] (kthread) from [<c0210aa0>]
(ret_from_fork+0x14/0x34)
[ 7.761366] Code: e24cb004 e52de004 e8bd4000 e510400c (e5935000)
[ 7.769273] ---[ end trace f25fc65c3d66034c ]---
[ 7.778675] Unable to handle kernel paging request at virtual
address ffffffec
[ 7.787735] pgd = c0204000
[ 7.792274] [ffffffec] *pgd=af7fd821, *pte=00000000, *ppte=00000000
[ 7.800424] Internal error: Oops: 17 [#2] SMP ARM
[ 7.806970] Modules linked in: ipv6
[ 7.812307] CPU: 1 PID: 110 Comm: kworker/1:2 Tainted: G D
4.0.0-rc1-next-20150303ccu #568
[ 7.823549] Hardware name: NVIDIA Tegra SoC (Flattened Device Tree)
[ 7.831683] task: ecfe8e40 ti: eb9e0000 task.ti: eb9e0000
[ 7.838946] PC is at kthread_data+0x18/0x20
[ 7.844997] LR is at wq_worker_sleeping+0x1c/0xe0
[ 7.851562] pc : [<c02750a4>] lr : [<c02704ac>] psr: 00070093
[ 7.851562] sp : eb9e1b38 ip : eb9e1b48 fp : eb9e1b44
[ 7.866774] r10: 2d74a000 r9 : ecfe90d4 r8 : c10aedd4
[ 7.873862] r7 : c10a9840 r6 : c10a9840 r5 : ecfe8e40 r4 : 00000001
[ 7.882253] r3 : 00000000 r2 : 00000000 r1 : 00000001 r0 : ecfe8e40
[ 7.890642] Flags: nzcv IRQs off FIQs on Mode SVC_32 ISA ARM
Segment user
[ 7.899735] Control: 10c5387d Table: ab82806a DAC: 00000015
[ 7.907359] Process kworker/1:2 (pid: 110, stack limit = 0xeb9e0220)
[ 7.915604] Stack: (0xeb9e1b38 to 0xeb9e2000)
[ 7.921836] 1b20:
eb9e1b5c eb9e1b48
[ 7.931894] 1b40: c02704ac c0275098 ee7f3840 ecfe8e40 eb9e1ba4
eb9e1b60 c0ad3a64 c027049c
[ 7.941971] 1b60: eb9e1bbc eb9e1b70 c025a2cc c02a81ac 00000000
00000001 ecfe6e08 eb9e0000
[ 7.952061] 1b80: eb9e199c eb9e1bc8 ecfe9050 00000001 c028f550
ec920000 eb9e1bbc eb9e1ba8
[ 7.962152] 1ba0: c0ad3cfc c0ad3700 0420806c ecfe8e40 eb9e1bfc
eb9e1bc0 c025a9b8 c0ad3cbc
[ 7.972247] 1bc0: eb9e1bec c10f0ffc eb9e1bc8 eb9e1bc8 eb9e1da8
c11ca184 c10b32e4 eb9e1da8
[ 7.982331] 1be0: 600f0193 0000000b c028f550 00000001 eb9e1c84
eb9e1c00 c0215058 c025a368
[ 7.992414] 1c00: eb9e0220 0000000b c0d7bc6c c0d7bc64 00000008
00000000 00000000 c10b32e4
[ 8.002501] 1c20: 6529b270 62633432 20343030 64323565 34303065
62386520 30303464 35652030
[ 8.012600] 1c40: 30343031 28206330 33393565 30303035 c0002029
c0ad17e4 c0e54bcc 00000000
[ 8.022715] 1c60: 00000017 eb9e1da8 00000000 00000000 00000003
eb9e1da8 eb9e1c9c eb9e1c88
[ 8.032832] 1c80: c0ad0df4 c0214c08 eb9e1da8 ecfe8e40 eb9e1cf4
eb9e1ca0 c0221408 c0ad0d8c
[ 8.042955] 1ca0: ecfe8e40 c10aedd4 ec9ab440 ecfe8e88 ee7f3840
c0285c50 ee7f3880 ec9ab490
[ 8.053090] 1cc0: eb9e1ce4 eb9e1cd0 c0284f88 c10b3864 00000017
c0221190 00000000 eb9e1da8
[ 8.063238] 1ce0: 00000003 00000001 eb9e1da4 eb9e1cf8 c02091e8
c022119c c10a9840 c0275894
[ 8.073402] 1d00: ecb5801c c0275894 eb9e1d3c eb9e1d18 c0275894
c0219c3c ecb5801c ecfe8e40
[ 8.083560] 1d20: ec9ab440 00000000 eb81ca80 c0ad3904 ee7f3840
ecfe8e40 ec9ab440 00000000
[ 8.093711] 1d40: eb81ca80 ecfe90d0 eb9e1d9c eb9e1d58 c0ad3904
c027b70c 00000000 000003ff
[ 8.103866] 1d60: 00000000 00000001 2d74a000 ee7f3840 00000000
eb9e0000 eb9e0000 eb9e1e84
[ 8.114026] 1d80: 00000002 c028f54c 200f0093 ffffffff eb9e1ddc
00000000 eb9e1e04 eb9e1da8
[ 8.124175] 1da0: c02157d8 c02091ac ebb19df4 00000003 00000000
00000000 eeda8ee0 00000001
[ 8.134322] 1dc0: ee837a28 00000000 00000000 00000003 00000001
eb9e1e04 eb9e1e08 eb9e1df0
[ 8.144464] 1de0: c028ed3c c028f54c 200f0093 ffffffff ee837a1c
00000001 eb9e1e34 eb9e1e08
[ 8.154608] 1e00: c028ed3c c028f540 00000000 ee837a24 800f0013
00000000 00000001 00000003
[ 8.164744] 1e20: 00000000 00000000 eb9e1e64 eb9e1e38 c028ef88
c028ecec 00000000 c026d274
[ 8.174879] 1e40: eb9e1e74 00000011 eb932fb4 00000000 ee837a24
c028f4f0 eb9e1eac eb9e1e68
[ 8.185012] 1e60: c026fdfc c028ef4c 600f0013 eb9e1e88 eb9e1e88
eb9e1e74 eb9e1e74 00000006
[ 8.195145] 1e80: 00000000 ebaf2a80 eb932f88 eb932f00 eb932f90
c11f5638 00000000 ee7f7005
[ 8.205283] 1ea0: eb9e1ebc eb9e1eb0 c026fedc c026fd20 eb9e1ee4
eb9e1ec0 c02ce3bc c026fecc
[ 8.215423] 1ec0: eb932f50 ecf81080 c11b0338 ee7f33c0 ee7f7000
00000000 eb9e1f24 eb9e1ee8
[ 8.225563] 1ee0: c026f4d0 c02ce254 ee7f33c0 ee7f33d4 eb9e0000
00000000 ecf81080 ee7f33c0
[ 8.235696] 1f00: ecf81098 ee7f33d4 eb9e0000 00000008 ecf81080
ee7f33c0 eb9e1f5c eb9e1f28
[ 8.245837] 1f20: c026ff6c c026f380 c026ff18 c10ae100 00000000
00000000 eb9c0180 ecf81080
[ 8.255986] 1f40: c026ff18 00000000 00000000 00000000 eb9e1fac
eb9e1f60 c02749d8 c026ff24
[ 8.266143] 1f60: 00000000 00000000 00000000 ecf81080 00000000
00000000 eb9e1f78 eb9e1f78
[ 8.276310] 1f80: 00000001 00010001 eb9e1f88 eb9e1f88 eb9c0180
c02748ec 00000000 00000000
[ 8.286498] 1fa0: 00000000 eb9e1fb0 c0210aa0 c02748f8 00000000
00000000 00000000 00000000
[ 8.296672] 1fc0: 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000
[ 8.306828] 1fe0: 00000000 00000000 00000000 00000000 00000013
00000000 00000000 00000000
[ 8.316967] [<c02750a4>] (kthread_data) from [<c02704ac>]
(wq_worker_sleeping+0x1c/0xe0)
[ 8.327023] [<c02704ac>] (wq_worker_sleeping) from [<c0ad3a64>]
(__schedule+0x370/0x5bc)
[ 8.337072] [<c0ad3a64>] (__schedule) from [<c0ad3cfc>] (schedule+0x4c/0xa4)
[ 8.346085] [<c0ad3cfc>] (schedule) from [<c025a9b8>] (do_exit+0x65c/0x960)
[ 8.355013] [<c025a9b8>] (do_exit) from [<c0215058>] (die+0x45c/0x474)
[ 8.363498] [<c0215058>] (die) from [<c0ad0df4>]
(__do_kernel_fault.part.10+0x74/0x84)
[ 8.373386] [<c0ad0df4>] (__do_kernel_fault.part.10) from
[<c0221408>] (do_page_fault+0x278/0x388)
[ 8.384322] [<c0221408>] (do_page_fault) from [<c02091e8>]
(do_DataAbort+0x48/0xa8)
[ 8.393960] [<c02091e8>] (do_DataAbort) from [<c02157d8>]
(__dabt_svc+0x38/0x60)
[ 8.403332] Exception stack(0xeb9e1da8 to 0xeb9e1df0)
[ 8.410356] 1da0: ebb19df4 00000003 00000000
00000000 eeda8ee0 00000001
[ 8.420519] 1dc0: ee837a28 00000000 00000000 00000003 00000001
eb9e1e04 eb9e1e08 eb9e1df0
[ 8.430686] 1de0: c028ed3c c028f54c 200f0093 ffffffff
[ 8.437733] [<c02157d8>] (__dabt_svc) from [<c028f54c>]
(wake_bit_function+0x18/0x6c)
[ 8.447569] [<c028f54c>] (wake_bit_function) from [<c028ed3c>]
(__wake_up_common+0x5c/0x90)
[ 8.457917] [<c028ed3c>] (__wake_up_common) from [<c028ef88>]
(__wake_up+0x48/0x5c)
[ 8.467579] [<c028ef88>] (__wake_up) from [<c026fdfc>]
(__cancel_work_timer+0xe8/0x1ac)
[ 8.477591] [<c026fdfc>] (__cancel_work_timer) from [<c026fedc>]
(cancel_work_sync+0x1c/0x20)
[ 8.488145] [<c026fedc>] (cancel_work_sync) from [<c02ce3bc>]
(css_free_work_fn+0x174/0x2ec)
[ 8.498620] [<c02ce3bc>] (css_free_work_fn) from [<c026f4d0>]
(process_one_work+0x15c/0x3dc)
[ 8.509093] [<c026f4d0>] (process_one_work) from [<c026ff6c>]
(worker_thread+0x54/0x4e8)
[ 8.519227] [<c026ff6c>] (worker_thread) from [<c02749d8>]
(kthread+0xec/0x104)
[ 8.528606] [<c02749d8>] (kthread) from [<c0210aa0>]
(ret_from_fork+0x14/0x34)
[ 8.537899] Code: e24cb004 e52de004 e8bd4000 e5903268 (e5130014)
[ 8.546045] ---[ end trace f25fc65c3d66034d ]---
[ 8.552705] Fixing recursive fault but reboot is needed!

Regards,

Tomeu
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/