Re: [Regression] md/raid1: write-intent logging/bitmap issue since fd3b6975e9c1 - v5.16-rc1

From: Linus Torvalds
Date: Mon Jan 03 2022 - 14:53:41 EST


[ Jens wasn't cc'd for some reason but was the signer-off-on the patch
you bisected to. Added him to the cc. I'll bounce the original
separately, as I also don't see this on lore.kernel.org - it might not
have gotten there yet ]

On Mon, Jan 3, 2022 at 11:30 AM Norbert Warmuth <nwarmuth@xxxxxxxxxxx> wrote:
>
> Please verify and either revert or fixup fd3b6975e9c1 if my analysis is
> correct.

Can you check if moving the WriteMostly bit to the "do behind I/O?"
section fixes things for you?

IOW, something like the attached patch..

Warning: This is very much a "Money see, monkey do" patch. I'm not
really familiar with the raid1 code ]

But yeah, if you see corruption and there isn't an absolutely trivial
fix for this, we should revert.

Linus
drivers/md/raid1.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 7dc8026cf6ee..85505424f7a4 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1496,12 +1496,13 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
if (!r1_bio->bios[i])
continue;

- if (first_clone && test_bit(WriteMostly, &rdev->flags)) {
+ if (first_clone) {
/* do behind I/O ?
* Not if there are too many, or cannot
* allocate memory, or a reader on WriteMostly
* is waiting for behind writes to flush */
if (bitmap &&
+ test_bit(WriteMostly, &rdev->flags) &&
(atomic_read(&bitmap->behind_writes)
< mddev->bitmap_info.max_write_behind) &&
!waitqueue_active(&bitmap->behind_wait)) {