Re: [PATCH] default to n for GROUP_SCHED and FAIR_GROUP_SCHED

From: Ingo Molnar
Date: Mon May 05 2008 - 17:06:21 EST



* Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:

> Another example of that kind of behaviour, for example, is just you
> fighting turning off 'default y' from FAIR_GROUP_SCHED, considering
> that it is known to cause latency problems and the reason isn't
> understood.

a side-note to this topic: after looking at a bunch of traces and after
a lot of testing, the latency problems are complex, but reasonably
well-understood.

Nevertheless we'll mark it default-disabled because it's been taking too
long to create and propagate the fixes. I've queued up a patch for that.
We might even mark it BROKEN for a single release so that the option
disappears from people's config? Or we could change the name to achieve
a similar effect.

The main design-level latency source was due to the hierarchic view of
group scheduling - we had a hierarchy of runqueues. CFS met the latency
targets, but only per level (per runqueue) of the hierarchy. So with
every new level, we got more maximum latency.

So for example on a system with fair user scheduling, it takes just a
couple of different UIDs to be probabilistically active at once to
generate a bad latency: say if root, nobody, distcc and mingo UIDs are
are active at once, the mingo task could see a 4x latency hit over the
target - 160 msecs instead of 40 msecs.

This is now believed to be fixed in sched-devel.git, via the "single
runqueue" and deadline-scheduling patches from Peter that flattens the
hierarchy of the group scheduler.

Another latency source was the skew of sched_clock() running too slow -
that way if the clock runs at 10% of its intended speed the scheduler
will turn a 40msec intended latency target into a 400 msec latency
target!

This bug too is now believed to be fixed via Peter's new sched_clock
code in sched-devel.git.

... and users now have a very objective stick they can use on us:
latencytop. It told us black and white when we sucked. (I am waiting for
the days when it will auto-create a scheduler trace for the worst
latency hit in the system, making it easy for users to submit traces.)

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/