Re: CFS review

From: Linus Torvalds
Date: Wed Aug 01 2007 - 22:18:36 EST

Next message: Satyam Sharma: "Re: Examine user space locks"
Previous message: Nick Piggin: "lmbench ctxsw regression with CFS"
In reply to: Andi Kleen: "Re: CFS review"
Next in thread: Willy Tarreau: "Re: CFS review"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Wed, 1 Aug 2007, Roman Zippel wrote:
>
> I'm not so sure about that. sched_clock() has to be fast, so many archs
> may want to continue to use jiffies. As soon as one does that one can also
> save a lot of computational overhead by using 32bit instead of 64bit.
> The question is then how easy that is possible.

I have to say, it would be interesting to try to use 32-bit arithmetic.

I also think it's likely a mistake to do a nanosecond resolution. That's
part of what forces us to 64 bits, and it's just not even an *interesting*
resolution.

It would be better, I suspect, to make the scheduler clock totally
distinct from the other clock sources (many architectures have per-cpu
cycle counters), and *not* try to even necessarily force it to be a
"time-based" one.

So I think it would be entirely appropriate to

- do something that *approximates* microseconds.

Using microseconds instead of nanoseconds would likely allow us to do
32-bit arithmetic in more areas, without any real overflow.

And quite frankly, even on fast CPU's, the scheduler is almost
certainly not going to be able to take any advantage of the nanosecond
resolution. Just about anything takes a microsecond - including IO. I
don't think nanoseconds are worth the ten extra bits they need, if we
could do microseconds in 32 bits.

And the "approximates" thing would be about the fact that we don't
actually care about "absolute" microseconds as much as something that
is in the "roughly a microsecond" area. So if we say "it doesn't have
to be microseconds, but it should be within a factor of two of a ms",
we could avoid all the expensive divisions (even if they turn into
multiplications with reciprocals), and just let people *shift* the CPU
counter instead.

In fact, we could just say that we don't even care about CPU counters
that shift frequency - so what? It gets a bit further off the "ideal
microsecond", but the scheduler just cares about _relative_ times
between tasks (and that the total latency is within some reasonable
value), it doesn't really care about absolute time.

Hmm?

It would still be true that something that is purely based on timer ticks
will always be liable to have rounding errors that will inevitably mean
that you don't get good fairness - tuning threads to run less than a timer
tick at a time would effectively "hide" them from the scheduler
accounting. However, in the end, I think that's pretty much unavoidable.
We should make sure that things mostly *work* for that situation, but I
think it's ok to say that the *quality* of the fairness will obviously
suffer (and possibly a lot in the extreme cases).

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Satyam Sharma: "Re: Examine user space locks"
Previous message: Nick Piggin: "lmbench ctxsw regression with CFS"
In reply to: Andi Kleen: "Re: CFS review"
Next in thread: Willy Tarreau: "Re: CFS review"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]