Re: [PATCH] sched, fair: Allow a per-cpu kthread waking a task to stack on the same CPU

From: Mel Gorman
Date: Tue Jan 28 2020 - 04:10:19 EST


On Tue, Jan 28, 2020 at 01:19:36AM +0000, Mel Gorman wrote:
> > <SNIP>
> > After all this, I have two questions that would help me understand
> > if this is what you are seeing:
> >
> > 1. to confirm: does removing just the WQ_UNBOUND from the CIL push
> > workqueue (as added in 8ab39f11d974) make the regression go away?
> >
>
> I'll have to check in the morning. Around the v5.4 development timeframe,
> I'm definite that reverting the patch helped but that was not an option
> given that it's fixing a correctness issue.
>

This is a comparison of the baseline kernel (tip at the time I started),
the proposed fix and a revert. The revert was not clean but I do not
believe it matters

dbench4 Loadfile Execution Time
5.5.0-rc7 5.5.0-rc7 5.5.0-rc7
tipsched-20200124 kworkerstack-v1r2 revert-XFS-wq-v1r2
Amean 1 58.69 ( 0.00%) 30.21 * 48.53%* 47.48 * 19.10%*
Amean 2 60.90 ( 0.00%) 35.29 * 42.05%* 51.13 * 16.04%*
Amean 4 66.77 ( 0.00%) 46.55 * 30.28%* 59.54 * 10.82%*
Amean 8 81.41 ( 0.00%) 68.46 * 15.91%* 77.25 * 5.11%*
Amean 16 113.29 ( 0.00%) 107.79 * 4.85%* 112.33 * 0.85%*
Amean 32 199.10 ( 0.00%) 198.22 * 0.44%* 200.31 * -0.61%*
Amean 64 478.99 ( 0.00%) 477.06 * 0.40%* 482.17 * -0.66%*
Amean 128 1345.26 ( 0.00%) 1372.64 * -2.04%* 1368.94 * -1.76%*
Stddev 1 2.64 ( 0.00%) 4.17 ( -58.08%) 5.01 ( -89.89%)
Stddev 2 4.35 ( 0.00%) 5.38 ( -23.73%) 4.48 ( -2.90%)
Stddev 4 6.77 ( 0.00%) 6.56 ( 3.00%) 7.40 ( -9.40%)
Stddev 8 11.61 ( 0.00%) 10.91 ( 6.04%) 11.62 ( -0.05%)
Stddev 16 18.63 ( 0.00%) 19.19 ( -3.01%) 19.12 ( -2.66%)
Stddev 32 38.71 ( 0.00%) 38.30 ( 1.06%) 38.82 ( -0.28%)
Stddev 64 100.28 ( 0.00%) 91.24 ( 9.02%) 95.68 ( 4.59%)
Stddev 128 186.87 ( 0.00%) 160.34 ( 14.20%) 170.85 ( 8.57%)

According to this, commit 8ab39f11d974 ("xfs: prevent CIL push holdoff
in log recovery") did introduce some unintended behaviour. The fix
actually performs better than a revert with the obvious benefit that it
does not reintroduce the functional breakage (log starvation) that the
commit originally fixed.

I still think that XFS is not the problem here, it's just the
messenger. The functional fix, delegating work to kworkers running on the
same CPU and blk-mq delivering IO completions to the same CPU as the IO
issuer are all sane decisions IMO. I do not think that adjusting any of
them to wakeup the task on a new CPU is sensible due to the loss of data
cache locality and potential snags with power management when waking a
CPU from idle state.

Peter, Ingo and Vincent -- I know the timing is bad due to the merge
window but do you have any thoughts on allowing select_idle_sibling to
stack a wakee task on the same CPU as a waker in this specific case?

--
Mel Gorman
SUSE Labs