[BUG] The kernel thread for md RAID1 could cause a md RAID1 arraydeadlock

From: K.Tanaka
Date: Mon Jan 14 2008 - 22:08:28 EST



This message describes the details about md-RAID1 issue found by
testing the md RAID1 using the SCSI fault injection framework.

Abstract:
Both the error handler for md RAID1 and write access request to the md RAID1
use raid1d kernel thread. The nr_pending flag could cause a race condition
in raid1d, results in a raid1d deadlock.

Details:
error handling write operation
------------------------------------------------------------------------------
A-1. Issue a read request

A-2 SCSI error detected
B-1. make_request() for raid1 starts.
A-3. raid1_end_read_request() is called
in the interrupt context. It detects
read error and wakes up raid1d
kernel thread. B-2. make_request() calls wait_barrier() to
increment nr_pending flag.
A-4. raid1d wake up

A-5. raid1d calls freeze_array() and waiting
for nr_pending to be decremented.
That means stop IO and wait for B-3. make_request() wakes up raid1d kernel thread
everything to go quite. to send write request to the lower layer.

B-4. raid1d wake up (already waken up by A-3)

(**** process stalls here because A-5 never ends ****)

A-6. raid1d calls fix_read_error() to
handle read error. B-5. raid1d calls generic_make_request() for write request.

B-6. raid1_end_write_request() is called in the
interrupt context when the write access is completed
and nr_pending flag is decremented.
The deadlock mechanism:
If raid1d waken up by detecting read error (A-4) goes into freeze_array()
right after make_request() for write request has incremented nr_pending flag(B-2),
raid1d stalls waiting for nr_pending flag to be decremented (A-5).
On the other hand, nr_pending flag incremented by make_request() for write request
will never be decremented because the flag can be decremented after raid1d issues
generic_make_request() (B-5, B-6) but now raid1d is stopped.

This problem could could easily be reproduced with by using the new fault injection framework,
using "no response from the SCSI device" simulation.
However, it could also occur if raid1 error handler contends with write
operation, but with low probability.

I will report the other problems after I clean up and post the code for
the scsi fault injection framework.

--
------------------------------------------------------------------------
Kenichi TANAKA | Open Source Software Platform Development Division
| Computers Software Operations Unit, NEC Corporation
| k-tanaka@xxxxxxxxxxxxx


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/