Re: [patch] CFS scheduler, -v12

From: Ingo Molnar
Date: Mon May 21 2007 - 04:57:32 EST



* William Lee Irwin III <wli@xxxxxxxxxxxxxx> wrote:

> cfs should probably consider aggregate lag as opposed to aggregate
> weighted load. Mainline's convergence to proper CPU bandwidth
> distributions on SMP (e.g. N+1 tasks of equal nice on N cpus) is
> incredibly slow and probably also fragile in the presence of arrivals
> and departures partly because of this. [...]

hm, have you actually tested CFS before coming to this conclusion?

CFS is fair even on SMP. Consider for example the worst-case
3-tasks-on-2-CPUs workload on a 2-CPU box:

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2658 mingo 20 0 1580 248 200 R 67 0.0 0:56.30 loop
2656 mingo 20 0 1580 252 200 R 66 0.0 0:55.55 loop
2657 mingo 20 0 1576 248 200 R 66 0.0 0:55.24 loop

66% of CPU time for each task. The 'TIME+' column shows a 2% spread
between the slowest and the fastest loop after just 1 minute of runtime
(and the spread gets narrower with time). Mainline does a 50% / 50% /
100% split:

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3121 mingo 25 0 1584 252 204 R 100 0.0 0:13.11 loop
3120 mingo 25 0 1584 256 204 R 50 0.0 0:06.68 loop
3119 mingo 25 0 1584 252 204 R 50 0.0 0:06.64 loop

and i fixed that in CFS.

or consider a sleepy workload like massive_intr, 3-tasks-on-2-CPUs:

europe:~> head -1 /proc/interrupts
CPU0 CPU1

europe:~> ./massive_intr 3 10
002623 00000722
002621 00000720
002622 00000721

Or a 5-tasks-on-2-CPS workload:

europe:~> ./massive_intr 5 50
002649 00002519
002653 00002492
002651 00002478
002652 00002510
002650 00002478

that's around 1% of spread.

load-balancing is a performance vs. fairness tradeoff so we wont be able
to make it precisely fair because that's hideously expensive on SMP
(barring someone showing a working patch of course) - but in CFS i got
quite close to having it very fair in practice.

> [...] Tong Li's DWRR repairs the deficit in mainline by synchronizing
> epochs or otherwise bounding epoch dispersion. This doesn't directly
> translate to cfs. In cfs cpu should probably try to figure out if its
> aggregate lag (e.g. via minimax) is above or below average, and push
> to or pull from the other half accordingly.

i'd first like to see a demonstration of a problem to solve, before
thinking about more complex solutions ;-)

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/