[PATCH] lock_page() doesn't lock if __wait_on_bit_lock returns -EINTR

From: Chris Mason
Date: Sat Dec 12 2015 - 11:24:19 EST


We have two reports of frequent crashes in btrfs where asserts in
clear_page_dirty_for_io() were triggering on missing page locks.

The crashes were much easier to trigger when processes were catching
ctrl-c's, and after much debugging it really looked like lock_page was a
noop.

This recent commit looks pretty suspect to me, and I confirmed that we
were exiting __wait_on_bit_lock() with -EINTR when it was called with
TASK_UNINTERRUPTIBLE

commit 68985633bccb6066bf1803e316fbc6c1f5b796d6
Author: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Date: Tue Dec 1 14:04:04 2015 +0100

sched/wait: Fix signal handling in bit wait helpers

The patch below is mostly untested, and probably not the right solution.
Dave's trinity run doesn't explode immediately anymore, and I wanted to
get this out for discussion. A quick look on the list doesn't show
anyone else has tracked this down, sorry if it's a dup.

Reported-by: Dave Jones <dsj@xxxxxx>,
Reported-by: Jon Christopherson <jon@xxxxxxxx>
Signed-off-by: Chris Mason <clm@xxxxxx>

diff --git a/kernel/sched/wait.c b/kernel/sched/wait.c
index f10bd87..12f69df 100644
--- a/kernel/sched/wait.c
+++ b/kernel/sched/wait.c
@@ -434,6 +434,8 @@ __wait_on_bit_lock(wait_queue_head_t *wq, struct wait_bit_queue *q,
ret = action(&q->key);
if (!ret)
continue;
+ if (ret == -EINTR && mode == TASK_UNINTERRUPTIBLE)
+ continue;
abort_exclusive_wait(wq, &q->wait, mode, &q->key);
return ret;
} while (test_and_set_bit(q->key.bit_nr, q->key.flags));
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/