Re: [patch 00/14] x86/irq: Plug various vector cleanup races

From: Joe Lawrence
Date: Mon Jan 18 2016 - 10:01:04 EST


On 01/16/2016 04:37 PM, Joe Lawrence wrote:
> On 01/14/2016 05:33 AM, Borislav Petkov wrote:
>> On Thu, Jan 14, 2016 at 09:24:35AM +0100, Thomas Gleixner wrote:
>>> On Mon, 4 Jan 2016, Joe Lawrence wrote:
>>>> No issues running the same PCI device removal and stress tests against
>>>> the patchset.
>>>
>>> Thanks for testing!
>>>
>>> Though there is yet another long standing bug in that area. Fix below.
>>>
>>> Thanks,
>>>
>>> tglx
>>>
>>> 8<--------------------
>>>
> [ ... snip ... ]
>>
>> s/d//
>>
>> With those micro-changes:
>>
>> Tested-by: Borislav Petkov <bp@xxxxxxx>
>>
>> :-)
>
> Tests still running ok here (with same micro-change as Borislav).

Hi Thomas,

When logging in this morning and looking at the box running the 14
patches + additional patch, I see it hit a hung task timeout in xhci USB
code about 39 hours in. Stack trace below (looks to be waiting on a
completion that never comes).

I didn't see this when running only the *initial* 14 patches. Of
course, before these irq cleanup fixes my tests never ran this long :)
So it may or may not be related to the patchset, I'm still poking around
the generated vmcore. Let me know if there is anything you might be
interested in looking at from the wreckage.

-- Joe



INFO: task kworker/0:1:1506 blocked for more than 120 seconds.
Tainted: P OE 4.3.0sra12+ #50
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/0:1 D 0000000000000000 0 1506 2 0x00000080
Workqueue: usb_hub_wq hub_event
ffff8801e46dba58 0000000000000046 ffff8810375dac00 ffff881038430000
ffff8801e46dc000 ffff88025ac20440 ffff88025ac20438 ffff881038430000
0000000000000000 ffff8801e46dba70 ffffffff81659893 7fffffffffffffff
Call Trace:
[<ffffffff81659893>] schedule+0x33/0x80
[<ffffffff8165c530>] schedule_timeout+0x200/0x2a0
[<ffffffff810e2761>] ? internal_add_timer+0x71/0xb0
[<ffffffff810e4994>] ? mod_timer+0x114/0x210
[<ffffffff8165a371>] wait_for_completion+0xf1/0x130
[<ffffffff810a70d0>] ? wake_up_q+0x70/0x70
[<ffffffff814b14a1>] xhci_discover_or_reset_device+0x1e1/0x540
[<ffffffff814723b8>] hub_port_reset+0x3c8/0x590
[<ffffffff81472aa5>] hub_port_init+0x525/0xb00
[<ffffffff81476068>] hub_port_connect+0x328/0x940
[<ffffffff81476cbc>] hub_event+0x63c/0xb00
[<ffffffff810947dc>] process_one_work+0x14c/0x3c0
[<ffffffff81095044>] worker_thread+0x114/0x470
[<ffffffff8165925f>] ? __schedule+0x2af/0x8b0
[<ffffffff81094f30>] ? rescuer_thread+0x310/0x310
[<ffffffff8109ab88>] kthread+0xd8/0xf0
[<ffffffff8109aab0>] ? kthread_park+0x60/0x60
[<ffffffff8165d75f>] ret_from_fork+0x3f/0x70
[<ffffffff8109aab0>] ? kthread_park+0x60/0x60