Re: [PATCH 1/1] sched/fair: Fix inaccurate tally of ttwu_move_affine

From: Libo Chen
Date: Thu Aug 25 2022 - 05:13:44 EST




On 8/25/22 00:30, Gautham R. Shenoy wrote:
On Wed, Aug 10, 2022 at 03:33:13PM -0700, Libo Chen wrote:
There are scenarios where non-affine wakeups are incorrectly counted as
affine wakeups by schedstats.

When wake_affine_idle() returns prev_cpu which doesn't equal to
nr_cpumask_bits, it will slip through the check: target == nr_cpumask_bits
in wake_affine() and be counted as if target == this_cpu in schedstats.

Replace target == nr_cpumask_bits with target != this_cpu to make sure
affine wakeups are accurately tallied.

Fixes: 806486c377e33 (sched/fair: Do not migrate if the prev_cpu is idle)
Suggested-by: Daniel Jordan <daniel.m.jordan@xxxxxxxxxx>
Signed-off-by: Libo Chen <libo.chen@xxxxxxxxxx>
---
kernel/sched/fair.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index da388657d5ac..b179da4f8105 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6114,7 +6114,7 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p,
target = wake_affine_weight(sd, p, this_cpu, prev_cpu, sync);
schedstat_inc(p->stats.nr_wakeups_affine_attempts);
- if (target == nr_cpumask_bits)
+ if (target != this_cpu)
return prev_cpu;

This seems to be the right thing to do. However,..

if this_cpu and prev_cpu were in the same LLC and we pick prev_cpu,
technically is it still not an affine wakeup?

I think schedstats like ttwu_move_affine/ttwu_wake_remote is defined within a sched domain, so if the waking CPU and the previous CPU are in the same MC domain, then picking the previous CPU is a remote wakeup
within that MC. If the two candidate CPUs are from two different NUMA nodes, then picking the waking CPU is an affine wakeup within that NUMA domain. Correct me if I am wrong, this definition is consistent across
all levels of sched domains.

But I do understand that when two candidate CPUs are within an LLC,
    a) all the fast-path wakeups should be affine wakeups if your definition of an affine wakeup is a wakeup to the same LLC of the waker.
    b) select_idle_sibling() may pick a CPU in that LLC other than the two candidate CPUs which makes the affine/remote stats here useless even if we are consistent with ttwu_move_affine/ttwu_wake_remote
       definition.

I personally think it's just too much trouble to add additional code in the kernel to, let's say, treat all wakeups within an LLC as ttwu_move_affine. It's a lot easier to do that when you process schedstats data,
whether you want to treat all wakeups in LLC domains as affine wakeups or ignore ttwu_move_affine/ttwu_wake_remote stats from LLC domains.



schedstat_inc(sd->ttwu_move_affine);
--
2.31.1

--
Thanks and Regards
gautham.