Re: [PATCH] sched/fair: Always update_curr() before placing at enqueue

From: K Prateek Nayak
Date: Mon Oct 16 2023 - 01:39:38 EST


Hello Daniel,

I see a good and consistent improvement in Stream (with shorter loops)
with this change. Everything else is more or less the same.

I'll leave the detailed results below.

On 10/6/2023 10:18 PM, Daniel Jordan wrote:
> Placing wants current's vruntime and the cfs_rq's min_vruntime up to
> date so that avg_runtime() is too, and similarly it wants the entity to
> be re-weighted and lag adjusted so vslice and vlag are fresh, so always
> do update_curr() and update_cfs_group() beforehand.
>
> There doesn't seem to be a reason to treat the 'curr' case specially
> after e8f331bcc270 since vruntime doesn't get normalized anymore.
>
> Fixes: e8f331bcc270 ("sched/smp: Use lag to simplify cross-runqueue placement")
> Signed-off-by: Daniel Jordan <daniel.m.jordan@xxxxxxxxxx>

o Machine details

- 3rd Generation EPYC System
- 2 sockets each with 64C/128T
- NPS1 (Each socket is a NUMA node)
- C2 Disabled (POLL and C1(MWAIT) remained enabled)

o Kernel Details

- tip: tip:sched/core at commit 3657680f38cd ("sched/psi: Delete the
'update_total' function parameter from update_triggers()") +
cherry-pick commit 8dafa9d0eb1a sched/eevdf: Fix min_deadline heap
integrity") from tip:sched/urgent + cherry-pick commit b01db23d5923
("sched/eevdf: Fix pick_eevdf()") from tip:sched/urgent

update_curr_opt: tip + this patch

o Benchmark Results

==================================================================
Test : hackbench
Units : Normalized time in seconds
Interpretation: Lower is better
Statistic : AMean
==================================================================
Case: tip[pct imp](CV) update_curr_opt[pct imp](CV)
1-groups 1.00 [ -0.00]( 2.69) 1.01 [ -0.71]( 2.88)
2-groups 1.00 [ -0.00]( 1.69) 1.01 [ -0.62]( 1.40)
4-groups 1.00 [ -0.00]( 1.25) 1.01 [ -1.17]( 1.03)
8-groups 1.00 [ -0.00]( 1.36) 1.00 [ -0.43]( 0.83)
16-groups 1.00 [ -0.00]( 1.44) 1.00 [ -0.13]( 2.32)


==================================================================
Test : tbench
Units : Normalized throughput
Interpretation: Higher is better
Statistic : AMean
==================================================================
Clients: tip[pct imp](CV) update_curr_opt[pct imp](CV)
1 1.00 [ 0.00]( 0.33) 1.01 [ 0.52]( 1.51)
2 1.00 [ 0.00]( 0.22) 1.00 [ -0.01]( 0.37)
4 1.00 [ 0.00]( 0.25) 0.99 [ -0.71]( 0.60)
8 1.00 [ 0.00]( 0.71) 1.00 [ -0.26]( 0.36)
16 1.00 [ 0.00]( 0.79) 0.99 [ -1.21]( 0.77)
32 1.00 [ 0.00]( 0.94) 0.99 [ -0.82]( 1.46)
64 1.00 [ 0.00]( 1.76) 0.99 [ -0.92]( 1.25)
128 1.00 [ 0.00]( 0.68) 0.98 [ -2.22]( 1.19)
256 1.00 [ 0.00]( 1.23) 0.99 [ -1.43]( 0.79)
512 1.00 [ 0.00]( 0.28) 0.99 [ -0.93]( 0.14)
1024 1.00 [ 0.00]( 0.20) 0.99 [ -1.44]( 0.41)


==================================================================
Test : stream-10
Units : Normalized Bandwidth, MB/s
Interpretation: Higher is better
Statistic : HMean
==================================================================
Test: tip[pct imp](CV) update_curr_opt[pct imp](CV)
Copy 1.00 [ 0.00](11.88) 1.22 [ 21.92]( 7.37)
Scale 1.00 [ 0.00]( 7.01) 1.04 [ 4.02]( 4.89)
Add 1.00 [ 0.00]( 6.56) 1.11 [ 11.03]( 4.77)
Triad 1.00 [ 0.00]( 8.81) 1.14 [ 14.12]( 3.89)


==================================================================
Test : stream-100
Units : Normalized Bandwidth, MB/s
Interpretation: Higher is better
Statistic : HMean
==================================================================
Test: tip[pct imp](CV) update_curr_opt[pct imp](CV)
Copy 1.00 [ 0.00]( 1.07) 1.01 [ 0.77]( 1.59)
Scale 1.00 [ 0.00]( 4.81) 0.97 [ -2.99]( 7.18)
Add 1.00 [ 0.00]( 4.56) 0.98 [ -2.39]( 6.86)
Triad 1.00 [ 0.00]( 1.78) 1.00 [ -0.35]( 4.22)


==================================================================
Test : netperf
Units : Normalized Througput
Interpretation: Higher is better
Statistic : AMean
==================================================================
Clients: tip[pct imp](CV) update_curr_opt[pct imp](CV)
1-clients 1.00 [ 0.00]( 0.62) 0.99 [ -1.03]( 0.24)
2-clients 1.00 [ 0.00]( 0.36) 1.00 [ -0.32]( 0.66)
4-clients 1.00 [ 0.00]( 0.31) 1.00 [ -0.17]( 0.44)
8-clients 1.00 [ 0.00]( 0.39) 1.00 [ 0.24]( 0.67)
16-clients 1.00 [ 0.00]( 0.58) 1.00 [ 0.50]( 0.46)
32-clients 1.00 [ 0.00]( 0.71) 1.01 [ 0.54]( 0.66)
64-clients 1.00 [ 0.00]( 2.13) 1.00 [ 0.35]( 1.80)
128-clients 1.00 [ 0.00]( 0.94) 0.99 [ -0.71]( 0.97)
256-clients 1.00 [ 0.00]( 6.09) 1.01 [ 1.28]( 3.41)
512-clients 1.00 [ 0.00](55.28) 1.01 [ 1.32](49.78)


==================================================================
Test : schbench
Units : Normalized 99th percentile latency in us
Interpretation: Lower is better
Statistic : Median
==================================================================
#workers: tip[pct imp](CV) update_curr_opt[pct imp](CV)
1 1.00 [ -0.00]( 2.91) 0.97 [ 2.56]( 1.53)
2 1.00 [ -0.00](21.78) 0.89 [ 10.53](10.23)
4 1.00 [ -0.00]( 4.88) 1.07 [ -7.32]( 6.82)
8 1.00 [ -0.00]( 2.49) 1.00 [ -0.00]( 9.53)
16 1.00 [ -0.00]( 3.70) 1.02 [ -1.75]( 0.99)
32 1.00 [ -0.00](12.65) 0.83 [ 16.51]( 4.41)
64 1.00 [ -0.00]( 3.98) 0.97 [ 2.59]( 8.27)
128 1.00 [ -0.00]( 1.49) 0.96 [ 3.60]( 8.01)
256 1.00 [ -0.00](40.79) 0.80 [ 20.39](36.89)
512 1.00 [ -0.00]( 1.12) 0.98 [ 2.20]( 0.75)


==================================================================
Test : ycsb-cassandra
Units : Normalized throughput
Interpretation: Higher is better
Statistic : Mean
==================================================================
metric tip update_curr_opt (%diff)
throughput 1.00 1.00 (%diff: -0.45%)


==================================================================
Test : ycsb-mondodb
Units : Normalized throughput
Interpretation: Higher is better
Statistic : Mean
==================================================================
metric tip update_curr_opt (%diff)
throughput 1.00 1.00 (%diff: -0.13%)


==================================================================
Test : DeathStarBench
Units : Normalized throughput
Interpretation: Higher is better
Statistic : Mean
==================================================================
Pinning scaling tip update_curr_opt (%diff)
1CCD 1 1.00 1.01 (%diff: 0.57%)
2CCD 2 1.00 1.00 (%diff: -0.27%)
4CCD 4 1.00 1.00 (%diff: 0.06%)
8CCD 8 1.00 1.00 (%diff: 0.45%)

--
Feel free to include

Tested-by: K Prateek Nayak <kprateek.nayak@xxxxxxx>

I'll keep a lookout for future versions.

> ---
>
> Not sure what the XXX above place_entity() is for, maybe it can go away?
>
> Based on tip/sched/core.
>
> kernel/sched/fair.c | 14 ++------------
> 1 file changed, 2 insertions(+), 12 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 04fbcbda97d5f..db2ca9bf9cc49 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5047,15 +5047,6 @@ static inline bool cfs_bandwidth_used(void);
> static void
> enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
> {
> - bool curr = cfs_rq->curr == se;
> -
> - /*
> - * If we're the current task, we must renormalise before calling
> - * update_curr().
> - */
> - if (curr)
> - place_entity(cfs_rq, se, flags);
> -
> update_curr(cfs_rq);
>
> /*
> @@ -5080,8 +5071,7 @@ enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
> * XXX now that the entity has been re-weighted, and it's lag adjusted,
> * we can place the entity.
> */
> - if (!curr)
> - place_entity(cfs_rq, se, flags);
> + place_entity(cfs_rq, se, flags);
>
> account_entity_enqueue(cfs_rq, se);
>
> @@ -5091,7 +5081,7 @@ enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
>
> check_schedstat_required();
> update_stats_enqueue_fair(cfs_rq, se, flags);
> - if (!curr)
> + if (cfs_rq->curr != se)
> __enqueue_entity(cfs_rq, se);
> se->on_rq = 1;
>

--
Thanks and Regards,
Prateek