Re: Bisected: Kernel 4.14 + has 3 times higher write IO latency than Kernel 4.4 with raid1

From: NeilBrown
Date: Mon Aug 05 2019 - 19:46:45 EST


On Mon, Aug 05 2019, Jinpu Wang wrote:

> Hi Neil,
>
> For the md higher write IO latency problem, I bisected it to these commits:
>
> 4ad23a97 MD: use per-cpu counter for writes_pending
> 210f7cd percpu-refcount: support synchronous switch to atomic mode.
>
> Do you maybe have an idea? How can we fix it?

Hmmm.... not sure.

My guess is that the set_in_sync() call from md_check_recovery()
is taking a long time, and is being called too often.

Could you try two experiments please.

1/ set /sys/block/md0/md/safe_mode_delay
to 20 or more. It defaults to about 0.2.

2/ comment out the call the set_in_sync() in md_check_recovery().

Then run the least separately after each of these changes.

I the second one makes a difference, I'd like to know how often it gets
called - and why. The test
if ( ! (
(mddev->sb_flags & ~ (1<<MD_SB_CHANGE_PENDING)) ||
test_bit(MD_RECOVERY_NEEDED, &mddev->recovery) ||
test_bit(MD_RECOVERY_DONE, &mddev->recovery) ||
(mddev->external == 0 && mddev->safemode == 1) ||
(mddev->safemode == 2
&& !mddev->in_sync && mddev->recovery_cp == MaxSector)
))
return;

should normally return when doing lots of IO - I'd like to know
which condition causes it to not return.

Thanks,
NeilBrown

Attachment: signature.asc
Description: PGP signature