Re: [dm-devel] [PATCH -next v2 4/6] md: refactor idle/frozen_sync_thread() to fix deadlock

From: Yu Kuai
Date: Wed Jun 14 2023 - 21:28:59 EST


Hi,

在 2023/06/14 17:08, Xiao Ni 写道:
On Wed, Jun 14, 2023 at 4:29 PM Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote:

Hi,

在 2023/06/14 15:57, Xiao Ni 写道:
On Wed, Jun 14, 2023 at 3:38 PM Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote:

Hi,

在 2023/06/14 15:12, Xiao Ni 写道:
On Wed, Jun 14, 2023 at 10:04 AM Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote:

Hi,

在 2023/06/14 9:48, Yu Kuai 写道:



In the patch, sync_seq is added in md_reap_sync_thread. In
idle_sync_thread, if sync_seq isn't equal

mddev->sync_seq, it should mean there is someone that stops the sync
thread already, right? Why do

you say 'new started sync thread' here?

If someone stops the sync thread, and new sync thread is not started,
then this sync_seq won't make a difference, above wait_event() will not
wait because !test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) will pass.
So 'sync_seq' is only used when the old sync thread stops and new sync
thread starts, add 'sync_seq' will bypass this case.

Hi

If a new sync thread starts, why can sync_seq be different? sync_seq
is only added in md_reap_sync_thread. And when a new sync request
starts, it can't stop the sync request again?

Af first, the sync_seq is 0

admin1
echo idle > sync_action
idle_sync_thread(sync_seq is 1)

Wait, I'm confused here, how can sync_seq to be 1 here? I suppose you
mean that there is a sync_thread just finished?

Hi Kuai

Yes. Because idle_sync_thread needs to wait until md_reap_sync_thread
finishes. And md_reap_sync_thread adds sync_seq. Do I understand your
patch right?

Yes, noted that idle_sync_thread() will only wait if MD_RECOVERY_RUNNING
is set.



Then the problem is that idle_sync_thread() read sync_seq after the old
sync_thread is done, and new sync_thread start before wait_event() is
called, should we wait for this new sync_thread?

My answer here is that we should, but I'm also ok to not wait this new
sync_thread, I don't think this behaviour matters. The key point here
is that once wait_event() is called from idle_sync_thread(), this
wait_event() should not wait for new sync_thread...

I think we should wait. If we don't wait for it, there is a problem.
One person echos idle to sync_action and it doesn't work sometimes.
It's a strange thing.


Ok. I'll add new comment to emphasize that idle_sync_thread() won't wait
for new sync_thread that is started after wait_event().

I suggest removing this function. Without this change, it's more
simple and it can work well without problem. The people that echo idle
to sync_action needs to wait until the sync action finishes. The code
semantic is clear and simple.


echo resync > sync_action (new sync)

If this is behind "echo idle > sync_action", idle_sync_thread should not
see that MD_RECOVERY_RUNNING is set and wait_event() won't wait at all.

`echo resync > sync_action` can't change the sync_seq. So 'echo idle >
sync_action' still waits until MD_RECOVERY_RUNNING is cleared?

This is not accurate, if `echo resync > sync_action` triggers a new
sync_thread, then sync_seq is updated when this sync_thread is done,
during this period, MD_RECOVERY_RUNNING is still set, so `echo idle
>sync_action` will wait for sync_thread to be done.

I can understand your comment, but sorry, I still can't get how
sync_seq works. Could you give a specific case that explains how it
works?

Ok, the problem is that echo ilde is supposed to interrupt sync_thread
and stop sync_thread quickly. Now that we don't hold mutex here, we
can't prevent new sync_thread to start. For exapmle:

1) a sync_thread A is runing, MD_RECOVERY_RUNNING is set;

2) echo idle, A will be interrupted, mutex is not hold and
idle_sync_thread() is waiting for MD_RECOVERY_RUNNING to be cleared.

3) A is interrupted, it'll clear MD_RECOVERY_RUNNING and try to wakeup
idle_sync_thread(), however, before idle_sync_thread() is woken, A can
be done and a new sync_thread B can be started, and MD_RECOVERY_RUNNING
will be set again.

4) idle_sync_thread() finially wake up, however, MD_RECOVERY_RUNNING is
set and it will still waiting. And this time B won't be interrupted.

Thanks,
Kuai