Re: [PATCH -next 2/3] md/raid10: convert resync_lock to use seqlock

From: Guoqing Jiang
Date: Thu Sep 01 2022 - 21:01:52 EST




On 9/2/22 8:56 AM, Logan Gunthorpe wrote:

On 2022-09-01 18:49, Guoqing Jiang wrote:

On 9/2/22 2:41 AM, Logan Gunthorpe wrote:
Hi,

On 2022-08-29 07:15, Yu Kuai wrote:
From: Yu Kuai <yukuai3@xxxxxxxxxx>

Currently, wait_barrier() will hold 'resync_lock' to read
'conf->barrier',
and io can't be dispatched until 'barrier' is dropped.

Since holding the 'barrier' is not common, convert 'resync_lock' to use
seqlock so that holding lock can be avoided in fast path.

Signed-off-by: Yu Kuai <yukuai3@xxxxxxxxxx>
I've found some lockdep issues starting with this patch in md-next while
running mdadm tests (specifically 00raid10 when run about 10 times in a
row).

I've seen a couple different lock dep errors. The first seems to be
reproducible on this patch, then it possibly changes to the second on
subsequent patches. Not sure exactly.
That's why I said "try mdadm test suites too to avoid regression." ...
You may have to run it multiple times, a single run tends not to catch
all errors. I had to loop the noted test 10 times to be sure I hit this
every time when I did the simple bisect.

And ensure that all the debug options are on when you run it (take a
look at the Kernel Hacking section in menuconfig). You won't hit this
bug without at least CONFIG_PROVE_LOCKING=y.

Yes,  we definitely need to enable the option to test change for locking stuffs.

Thanks,
Guoqing