Re: 'global' rq->clock

From: Peter Zijlstra
Date: Sun May 04 2008 - 08:13:28 EST


On Fri, 2008-05-02 at 03:09 -0700, Arjan van de Ven wrote:
> On Fri, 02 May 2008 14:48:27 -0700 (PDT)
> David Miller <davem@xxxxxxxxxxxxx> wrote:
>
> > From: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx>
> > Date: Fri, 02 May 2008 21:56:26 +0200
> >
> > > Ok, the the below would need something that relates
> > > tick_timestamp's to one another.. probably sucks without it..
> > >
> > > OTOH, Andi said he was working on a fastish global sched_clock()
> > > thingy, Andi got a link to that code?
> >
> > While I'm fine with this kind of stuff being added to constantly cope
> > with x86's joke of a TSC register implementation, it's starting to
> > become an enormous burdon for platforms where the TICK source actually
> > works properly.
>
> it's a sad affair indeed. On some systems it counts cycles, on other
> systems it counts time. On most systems it stops while idle, on others
> it keeps running. On most systems its not very synchronized between
> packages, and on some systems it's not even synchronized between cores.
>
> I'm not convinced TSC is the right thing for the scheduler in the first
> place; on current x86 systems TSC counts "time" not "work done". Now of
> course "time" is an approximation for "work done", but not a very good
> one given the presence of what effectively is variable cpu speeds
> (software CPU frequency control is only part of that; there's also the
> finegrained hardware level frequency control as done by what marketing
> people call "Intel Dynamic Acceleration technology"). [*]
>
> I and others have talked to Peter about this already, and I'm sure we'll
> talk more about it in the future as well.. at some point this part of
> CFS needs to fundamentally be cleaned up. Since this gets into a debate
> about what fairness means ;(

> [*] The converse is also true; cycles aren't a good representation of
> time either; this makes cycle based profilers a bit iffy if you're
> interested in where the system spends time rather than where it spends
> cycles.

My current view on this is that per cpu scheduling should use a time
based clock, whereas smp load balancing can take the cycle (ie work)
counter to balance different work/time loads and estimate core power.

As for using the TSC, I'm afraid we just don't have any choice. Although
I guess we could dynamically detect some counter on new cpus and use
that when available. But the TSC is the only thing available on a lot of
existing machines.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/