Crashes with 874bbfe600a6 in 3.18.25

From: Jan Kara
Date: Wed Jan 20 2016 - 16:19:21 EST


Hello,

a friend of mine started seeing crashes with 3.18.25 kernel - once
appropriate load is put on the machine it crashes within minutes. He
tracked down that reverting commit 874bbfe600a6 (this is the commit ID from
Linus' tree, in stable tree the commit ID is 1e7af294dd03) "workqueue: make
sure delayed work run in local cpu" makes the kernel stable again. I'm
attaching screenshot of the crash - sadly the initial part is missing but
it seems that we crashed when processing timers on otherwise idle CPU. This
is a production machine so experimentation is not easy but if we really
need more information it may be possible to reproduce the issue again and
gather it.

Anyone has idea what is going on? I was looking into the code for a while
but so far I have no good explanation. It would be good to understand the
cause instead of just blindly reverting the commit from stable tree...

Honza
--
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR

Attachment: delayed-work-oops.png
Description: PNG image