Re: Interesting scheduling times - NOT

Peter T. Breuer (ptb@it.uc3m.es)
Fri, 25 Sep 1998 12:10:04 +0200 (MET DST)


"A month of sundays ago Oliver Xymoron wrote:"
>
> > measuring things - but it's wrong in practice. If you take a suite of
> > tests, lmbench for example, and do a bunch of runs and scatter plot them
> > and stare at them you'll see patterns emerging. Now if the pattern was
> > that most run times clustered around the min, then my feeling is that
> > the min is the right number. Wherever they cluster up is the number I
> > wanted because that was the number mostly likely to be seen.
>
> Again, I agree that generally the average is the number that's
> interesting. But earlier you seemed to imply that the minimum is not
> generally a meaningful number, because they were way out on the tails of

To agree for the third time - you are all saying compatible things, but
not the same thing.

Yes, clearly the minimum of a set of measurements tends with certainty
to the minimum possible bound in the test (the probability of staying
above some M N times in a row is q^N for some q).

Yes, clearly the minimum value from a normal distribution (two infinite
tails) is meaningless and wildly variable. There is no lower bound. Bye
bye observations.

Yes, Larry says that he reports whatever number will be observed most
frequently in practice, which is, by definition, the median - i.e. the
major peak of the distribution.

Yes, if the distribution is bipopular, that number will hop from one peak
to another and give you variable results. However, the distribution
will clearly show that there are two (or more) populations.

> mean that the minimum is meaningless. It may generally be a poor number to
> use for benchmarks, but it may be able to tell you about room for
> improvement, underlying physical constraints, etc. See the ping example.

Yes. It tells you the real minimum, with certainty in the limit. Just
like the median tells you the median in the limit, with certainty, and
the median is the mean in a normal curve and it is not the mean in other
distributions.

BTW - the median looks to me to be much more variable than the mean even
in a normal distribution. That would be a source of additional
statistical variation. Please do some simulations to check the variance
of the median against the expected SD of the distribution! A log normal
distribution or a poisson distribution (x^n exp(-nx) ) would be an
appropriate pattern to use. The simplest way to generate a result X with
the correct distribution p(x) is probably to generate independent pairs
(x,y) but only accept an X=x if y < p(x).

My 2s worth of thinking. I still think that the little I have seen of
Richards results show a hidden variable effect that can be perfectly
easily extracted statistically. Benchmarks need not measure only one
parameter at a time to be useful. Statistics is all about extracting
the underlying information from such mixtures. One just has to apply
it.

Peter

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/