Re: [PATCH -next v2 7/7] md: delay remove_and_add_spares() for read only array to md_start_sync()

From: Yu Kuai
Date: Sat Aug 19 2023 - 22:47:08 EST


Hi,

在 2023/08/16 15:18, Xiao Ni 写道:
On Tue, Aug 15, 2023 at 11:13 AM Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote:

From: Yu Kuai <yukuai3@xxxxxxxxxx>

Before this patch, for read-only array:

md_check_recovery() check that 'MD_RECOVERY_NEEDED' is set, then it will
call remove_and_add_spares() directly to try to remove and add rdevs
from array.

After this patch:

1) md_check_recovery() check that 'MD_RECOVERY_NEEDED' is set, and the
worker 'sync_work' is not pending, and there are rdevs can be added
or removed, then it will queue new work md_start_sync();
2) md_start_sync() will call remove_and_add_spares() and exist;

Hi Kuai

If md_check_recovery returns and md_starts_sync doesn't start, during
the window the raid changes to read-write, the sync thread can be run
but MD_RECOVERY_RUNNING isn't set. Is there such a problem?

That's a good question, looks like this is possible. Read-only array
doesn't test or set MD_RECOVERY_RUNNING at all in md_check_recovery().
I'll use MD_RECOVERY_RUNNING to prevent such concurrent scenario in
the new version.

Thanks,
Kuai


Regards
Xiao


This change make sure that array reconfiguration is independent from
daemon, and it'll be much easier to synchronize it with io, consier
that io may rely on daemon thread to be done.

Signed-off-by: Yu Kuai <yukuai3@xxxxxxxxxx>
---
drivers/md/md.c | 37 +++++++++++++++++++++++++++----------
1 file changed, 27 insertions(+), 10 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index d26d2c35f9af..74d529479fcf 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -9191,6 +9191,16 @@ static bool rdev_addable(struct md_rdev *rdev)
return true;
}

+static bool md_spares_need_change(struct mddev *mddev)
+{
+ struct md_rdev *rdev;
+
+ rdev_for_each(rdev, mddev)
+ if (rdev_removeable(rdev) || rdev_addable(rdev))
+ return true;
+ return false;
+}
+
static int remove_and_add_spares(struct mddev *mddev,
struct md_rdev *this)
{
@@ -9309,6 +9319,12 @@ static void md_start_sync(struct work_struct *ws)

mddev_lock_nointr(mddev);

+ if (!md_is_rdwr(mddev)) {
+ remove_and_add_spares(mddev, NULL);
+ mddev_unlock(mddev);
+ return;
+ }
+
if (!md_choose_sync_direction(mddev, &spares))
goto not_running;

@@ -9403,7 +9419,8 @@ void md_check_recovery(struct mddev *mddev)
}

if (!md_is_rdwr(mddev) &&
- !test_bit(MD_RECOVERY_NEEDED, &mddev->recovery))
+ (!test_bit(MD_RECOVERY_NEEDED, &mddev->recovery) ||
+ work_pending(&mddev->sync_work)))
return;
if ( ! (
(mddev->sb_flags & ~ (1<<MD_SB_CHANGE_PENDING)) ||
@@ -9431,15 +9448,8 @@ void md_check_recovery(struct mddev *mddev)
*/
rdev_for_each(rdev, mddev)
clear_bit(Blocked, &rdev->flags);
- /* On a read-only array we can:
- * - remove failed devices
- * - add already-in_sync devices if the array itself
- * is in-sync.
- * As we only add devices that are already in-sync,
- * we can activate the spares immediately.
- */
- remove_and_add_spares(mddev, NULL);
- /* There is no thread, but we need to call
+ /*
+ * There is no thread, but we need to call
* ->spare_active and clear saved_raid_disk
*/
set_bit(MD_RECOVERY_INTR, &mddev->recovery);
@@ -9447,6 +9457,13 @@ void md_check_recovery(struct mddev *mddev)
clear_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
clear_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags);
+
+ /*
+ * Let md_start_sync() to remove and add rdevs to the
+ * array.
+ */
+ if (md_spares_need_change(mddev))
+ queue_work(md_misc_wq, &mddev->sync_work);
goto unlock;
}

--
2.39.2


.