Re: [PATCH -next 1/5] md/raid5: don't allow replacement while reshape is not done

From: Yu Kuai
Date: Sun May 21 2023 - 23:46:29 EST


Hi,

在 2023/05/20 7:33, Song Liu 写道:
On Thu, May 11, 2023 at 6:59 PM Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote:

From: Yu Kuai <yukuai3@xxxxxxxxxx>

Set rdev replacement has but not only two conditions:

1) MD_RECOVERY_RUNNING is not set;
2) rdev nr_pending is 0;

The above is confusing. I updated it and applied the set to md-next.

By the way, I'm willing to add regression test for these problems, and I
already send two other tests and there are no response yet. Should the
test wait for fixed patch to be applied to make progress?

Thanks,
Kuai
Please let me know if it looks good.

Thanks,
Song


If reshape is interrupted(for example, echo frozen to sync_action), then
rdev replacement can be set. It's safe because reshape is always prior to
resync in md_check_recovery(). However, if system reboots, then kernel will
complain cannot handle concurrent replacement and reshape and this array
is not able to assemble anymore.

Fix this problem by don't allow replacement until reshape is done.

Reported-by: Peter Neuwirth <reddunur@xxxxxxxxx>
Link: https://lore.kernel.org/linux-raid/e2f96772-bfbc-f43b-6da1-f520e5164536@xxxxxxxxx/
Signed-off-by: Yu Kuai <yukuai3@xxxxxxxxxx>
---
drivers/md/raid5.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index a58507a4345d..bd3b535c0739 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -8378,6 +8378,7 @@ static int raid5_add_disk(struct mddev *mddev, struct md_rdev *rdev)
p = conf->disks + disk;
tmp = rdev_mdlock_deref(mddev, p->rdev);
if (test_bit(WantReplacement, &tmp->flags) &&
+ mddev->reshape_position == MaxSector &&
p->replacement == NULL) {
clear_bit(In_sync, &rdev->flags);
set_bit(Replacement, &rdev->flags);
--
2.39.2

.