Re: Schedulers benchmark - Was: [ANNOUNCE][RFC] PlugSched-5.2.4 for2.6.12 and 2.6.13-rc6

From: Peter Williams
Date: Wed Aug 17 2005 - 18:49:59 EST


Con Kolivas wrote:
On Thu, 18 Aug 2005 09:15 am, Peter Williams wrote:

Con Kolivas wrote:

On Wed, 17 Aug 2005 18:10, Peter Williams wrote:

Michal Piotrowski wrote:

Hi,
here are schedulers benchmark (part2):
[bits deleted]

Here's a summary of your output generated using the attached Python
script.

| Build Statistics | Overall Statistics

-----------------------------------------------------------------------
Scheduler| Real CPU SYS TPT | CPU TPT delay CXSW

| (secs) (secs) (%) (%) | (secs) (%) (secs)

-----------------------------------------------------------------------
ingosched| 3128.5 5056.3 8.18 161.6 | 5379.5 171.9 159367.4 1556452
staircase| 3131.2 5032.6 8.09 160.7 | 5352.9 170.9 135193.0 1670366
spa_no_frills| 3103.8 5049.5 7.98 162.7 | 5266.7 169.7 172384.8 520937
zaphod(d,d)| 3561.7 4823.8 9.25 135.4 | 5132.0 144.1 148361.5 1771617
zaphod(d,0)| 3551.2 4809.9 9.19 135.4 | 5114.7 144.0 144022.0 1784814
zaphod(0,d)| 3126.8 5063.2 8.11 161.9 | 5278.1 168.8 173438.4 573587
zaphod(0,0)| 3105.5 5052.9 7.98 162.7 | 5254.8 169.2 165774.4 577534
nicksched| 3294.7 5095.1 9.10 154.6 | 5425.4 164.6 104298.2 2205665

where the (x,y) after zaphod means (max_ia_bonus, max_tpt_bonus) and "d"
means default. I had to kill a few significant digits to squeeze it
into 71 columns. Overall statistics are extracted from the schedstats
data. In the "Build Statistics" "CPU" is the sum of the user and sys
times and "SYS" is the percentage of that which was sys time (as I feel
that is a better thing to compare than raw sys times).

I was intrigued by the fact that zaphod(d,d) and zaphod(d,0) take longer
in real time but use less cpu. I was assuming that this meant that some
other job was getting some cpu but the schedstats data doesn't support
that. Also it wouldn't make sense anyway as you'd expect jobs doing the
same amount of work to use roughly the same amount of cpu. My latest
theory is that your machine has hyper threads and this artifact is
caused by the mechanism in the scheduler for handling tasks with
differing priority in sibling hyper thread channels. Does your system
have hyper threads?

That would only do something if there was a difference in 'nice' levels.

Not in zaphod and spa_no_frills. They user dynamic priority. I may
rethink this as the argument for using dynamic priority mainly applies
to the entitlement based mode of zaphod.


Static - like mainline and staircase - would be a good idea as the whole point of hyperthread-aware 'nice' levels is they obey 'nice', not dynamic priority.

Yes, but when zaphod's in "entitlement based" mode dynamic priority is a better measure of "nice". But I think I'll modify spa_no_frills and zaphod in "priority based" mode to use static priority (as you suggest).



What
you're seeing is the fact that balancing is intimately tied in with
timeslice size and you have increased idle time.

I partially agree in that reducing the time slice size would reduce the
size of the effect but it's not the cause of the effect. The hyper
threading code is the cause.

BTW I'm wondering why the TPT column (i.e. cpu time / real elapsed
time) is so low for all schedulers. It's been a long time since I ran
your "contest" benchmarks for any of these schedulers but I seem to
recall that they all did a lot better than this when I extracted the
equivalent data from the output. Generally, I think that they all used
greater than 95% of the available cpu time which would be the equivalent
of TPT values of 190% or more in this case.


He did a make allyesconfig which is a bit different and probably far too i/o bound. By the way a single kernel compile is hardly a reproducible benchmark. Ideally he should be using my 'kernbench' benchmark (hint hint).

Is that what I meant when I said "contest"? I wasn't sure that I used the right name.



Another interesting thing to be noted in these numbers is that the cost
of the extra context switches caused by the "improved interactive
performance" measures doesn't seem to be very significant.


I doubt this workload is running into that, the i/o bound nature of a kernel compile means it is not really a pure cpu usage scalability measure, it's just easy to do. Check the lkml archives for some comments by WLIrwin on the limitations of using kernbench as a scalability measure.

I should have calculated the average cpu consumption per context switch. If that's reasonably large then the number of context switches won't matter much. It's probably more a interesting number than the count of context switches so I'll alter the script accordingly.

I would have liked to have done more analysis of the load balancing but some of the schedstats data was corrupt. It should be roughly the same for all schedulers if my attempts to make scheduling and load balancing orthogonal have been successful. I was thinking of doing something equivalent to a Chi-squared test to get a single number that describes how good load balancing has been.

Peter
--
Peter Williams pwil3058@xxxxxxxxxxxxxx

"Learning, n. The kind of ignorance distinguishing the studious."
-- Ambrose Bierce
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/