Re: [PATCH] sched: Support current clocksource handling in fallback sched_clock().

From: Paul Mundt
Date: Tue May 26 2009 - 10:44:23 EST


On Tue, May 26, 2009 at 04:31:58PM +0200, Linus Walleij wrote:
> 2009/5/26 Paul Mundt <lethal@xxxxxxxxxxxx>:
>
> > */
> > unsigned long long __attribute__((weak)) sched_clock(void)
> > {
> > + /*
> > + * Use the current clocksource when it becomes available later in
> > + * the boot process, and ensure that it has a high enough rating
> > + * to make it suitable for general use.
> > + */
> > + if (clock && clock->rating >= 100)
> > + return cyc2ns(clock, clocksource_read(clock));
> > +
> > + /* Otherwise just fall back on jiffies */
> > return (unsigned long long)(jiffies - INITIAL_JIFFIES)
> > * (NSEC_PER_SEC / HZ);
> > }
>
> This seems like it would make the patch I sent the other day
> unnecessary (subject u300 sched_clock() implementation).
>
> It would also trim off this solution found in all OMAP platforms in
> arch/arm/plat-omap/common.c
>
> BUT Peter Zijlstra replied to my question about why this wasn't
> generic with:
>
Hum, you trimmed out my changelog which explains all the rationale for
precisely why this needs to be generic. Hopefully people that care will
go back and read that.

> [peterz]:
> > But that is the reason this isn't generic, non of the 'stable'
> > clocksources on x86 are fast enough to use as sched_clock.
>
> Does that mean clock->rating for these clocksources is
> for certain < 100?
>
The '100' thing is a bit arbitrary, this is what defines base level
usability. If we want to set a mandate that sched_clock() sources need to
start at 300 or 400 or whatever, that is fine with me, too. In the case
of x86 there are several < 100 ratings, but I don't know if those cover
all of the cases Peter is concerned about.

Regardless, the above cyc2ns() logic does make most of the
architecture-specific sched_clock() implementations redundant, so they
can of course be killed off incrementally. This might not be the case for
x86, but in those cases I expect a different sched_clock() to be
implemented anyways.

> Else you might want an additional criteria, like
> cyc2ns(1) (much less than) jiffies_to_usecs(1)*1000
> (however you do that the best way)
> so you don't pick something
> that isn't substantially faster than the jiffy counter atleast?
>
This rather defeats the purpose of sched_clock() being fast. If we want
to add a flag that means this in to the clocksource instead of consulting
the rating, that is fine with me too. I know which clocksources I prefer
to use for a sched_clock() and they are all better than jiffies. The
semantics of how we tell sched_clock() that are not so important. Rating
seemed like a good choice from the documentation in struct clocksource at
least.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/