[BUG] Deadlock in block/blk-flush.c, with resolution

From: Dragan Milenkovic
Date: Wed Feb 06 2019 - 07:53:34 EST


The bug manifests by mdX_raid1 and other related tasks being blocked.

It is triggered by LVM RAID, but is not caused by it. I have also triggered it by LVM + mdraid, but only once. It is more frequent by
LVM RAID.

It does not occur in the master branch, but it does in 4.20.y, 4.19.y, 4.18.y. Here is a Debian bug report:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=913119

I have tracked it to this commit:


https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=344e9ffcbd1898e1dc04085564a6e05c30ea8199

Specifically to this line:


https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/diff/block/blk-flush.c?id=344e9ffcbd1898e1dc04085564a6e05c30ea8199

The commit log message makes it appear as if this is a refactoring change, but the check for q->elevator was inverted.

The line has not been changed between that commit and the current master branch. Since I applied this change to my distribution's kernel (4.19), my system has been completely stable.

Let me know if you need me to do anything else, but this seems as a straight-forward cherry-pick.

Dragan