Re: [dm-devel] [PATCH -next v2 3/6] md: add a mutex to synchronize idle and frozen in action_store()

From: Yu Kuai
Date: Tue Jun 13 2023 - 21:15:16 EST


Hi,

在 2023/06/13 22:43, Xiao Ni 写道:

在 2023/5/29 下午9:20, Yu Kuai 写道:
From: Yu Kuai <yukuai3@xxxxxxxxxx>

Currently, for idle and frozen, action_store will hold 'reconfig_mutex'
and call md_reap_sync_thread() to stop sync thread, however, this will
cause deadlock (explained in the next patch). In order to fix the
problem, following patch will release 'reconfig_mutex' and wait on
'resync_wait', like md_set_readonly() and do_md_stop() does.

Consider that action_store() will set/clear 'MD_RECOVERY_FROZEN'
unconditionally, which might cause unexpected problems, for example,
frozen just set 'MD_RECOVERY_FROZEN' and is still in progress, while
'idle' clear 'MD_RECOVERY_FROZEN' and new sync thread is started, which
might starve in progress frozen. A mutex is added to synchronize idle
and frozen from action_store().

Signed-off-by: Yu Kuai <yukuai3@xxxxxxxxxx>
---
  drivers/md/md.c | 5 +++++
  drivers/md/md.h | 3 +++
  2 files changed, 8 insertions(+)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 23e8e7eae062..63a993b52cd7 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -644,6 +644,7 @@ void mddev_init(struct mddev *mddev)
      mutex_init(&mddev->open_mutex);
      mutex_init(&mddev->reconfig_mutex);
      mutex_init(&mddev->delete_mutex);
+    mutex_init(&mddev->sync_mutex);
      mutex_init(&mddev->bitmap_info.mutex);
      INIT_LIST_HEAD(&mddev->disks);
      INIT_LIST_HEAD(&mddev->all_mddevs);
@@ -4785,14 +4786,18 @@ static void stop_sync_thread(struct mddev *mddev)
  static void idle_sync_thread(struct mddev *mddev)
  {
+    mutex_lock(&mddev->sync_mutex);
      clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
      stop_sync_thread(mddev);
+    mutex_unlock(&mddev->sync_mutex);
  }
  static void frozen_sync_thread(struct mddev *mddev)
  {
+    mutex_init(&mddev->delete_mutex);


typo error? It should be mutex_lock(&mddev->sync_mutex); ?


Yes, and thanks for spotting this, this looks like I did this while
rebasing.

Thanks,
Kuai
Regards

Xiao

      set_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
      stop_sync_thread(mddev);
+    mutex_unlock(&mddev->sync_mutex);
  }
  static ssize_t
diff --git a/drivers/md/md.h b/drivers/md/md.h
index bfd2306bc750..2fa903de5bd0 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -537,6 +537,9 @@ struct mddev {
      /* Protect the deleting list */
      struct mutex            delete_mutex;
+    /* Used to synchronize idle and frozen for action_store() */
+    struct mutex            sync_mutex;
+
      bool    has_superblocks:1;
      bool    fail_last_dev:1;
      bool    serialize_policy:1;

.