Re: CFS review

From: Willy Tarreau
Date: Fri Aug 03 2007 - 00:56:32 EST


On Thu, Aug 02, 2007 at 09:31:19PM -0700, Arjan van de Ven wrote:
> On Fri, 2007-08-03 at 06:18 +0200, Willy Tarreau wrote:
> > On Thu, Aug 02, 2007 at 08:57:47PM -0700, Arjan van de Ven wrote:
> > > On Thu, 2007-08-02 at 22:04 -0500, Matt Mackall wrote:
> > > > On Wed, Aug 01, 2007 at 01:22:29PM +0200, Ingo Molnar wrote:
> > > > >
> > > > > * Roman Zippel <zippel@xxxxxxxxxxxxxx> wrote:
> > > > >
> > > > > > [...] e.g. in this example there are three tasks that run only for
> > > > > > about 1ms every 3ms, but they get far more time than should have
> > > > > > gotten fairly:
> > > > > >
> > > > > > 4544 roman 20 0 1796 520 432 S 32.1 0.4 0:21.08 lt
> > > > > > 4545 roman 20 0 1796 344 256 R 32.1 0.3 0:21.07 lt
> > > > > > 4546 roman 20 0 1796 344 256 R 31.7 0.3 0:21.07 lt
> > > > > > 4547 roman 20 0 1532 272 216 R 3.3 0.2 0:01.94 l
> > > > >
> > > > > Mike and me have managed to reproduce similarly looking 'top' output,
> > > > > but it takes some effort: we had to deliberately run a non-TSC
> > > > > sched_clock(), CONFIG_HZ=100, !CONFIG_NO_HZ and !CONFIG_HIGH_RES_TIMERS.
> > > >
> > > > ..which is pretty much the state of play for lots of non-x86 hardware.
> > >
> > > question is if it's significantly worse than before. With a 100 or
> > > 1000Hz timer, you can't expect perfect fairness just due to the
> > > extremely rough measurement of time spent...
> >
> > Well, at least we're able to *measure* that task 'l' used 3.3% and
> > that tasks 'lt' used 32%.
>
> It's not measured if you use jiffies level stuff. It's at best sampled!

But if we rely on the same sampling method, at least we will report
something consistent with what happens. And sampling is often the
correct method to get finer resolution on a macroscopic scale.

I mean, we're telling users that we include the "completely fair scheduler"
in 2.6.23, a scheduler which will ensure that all tasks get a fair share of
CPU time. A user starts top and sees 33%+32%+32+3% for 4 tasks while he
would have expected to see 25%+25%+25%+25%. You can try to explain users
that it's the fairest distribution, but they will have a hard time believing
it, especially when they measure the time spent on CPU with the "time"
command. OK this is all sampling, but we should try to avoid relying on
different sources of data for computation and reporting. Time and Top
should report something close to 4*25% for comparable tasks. And if not,
because of some sampling problem, maybe the scheduler cannot be that fair
in some situations, but either it should make use of the sampling time
and top use, or top and time should rely on the view of the scheduler.

I'll try to quickly hack up a program which makes use of rdtsc from
userspace to precisely measure user-space time, and disable TSC use
from the kernel to see how the values diverge.

Regards,
Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/