Re: 2.6.21-rc1: known regressions (v2) (part 2)

From: Con Kolivas
Date: Thu Mar 01 2007 - 06:15:54 EST


On Thursday 01 March 2007 19:46, Ingo Molnar wrote:
> * Mike Galbraith <efault@xxxxxx> wrote:
> > I see no real difference between the two assertions. Nice is just a
> > mechanism to set priority, so I applied your assertion to a different
> > range of priorities than nice covers, and returned it to show that the
> > code contradicts itself. It can't be bad for a nice 1 task to run
> > with a nice 0 task, but OK for a minimum RT task to run with a maximum
> > RT task. Iff HT without corrective measures breaks nice, then it
> > breaks realtime priorities as well.
>
> i'm starting to lean towards your view that we should not artificially
> keep tasks from running, when there's a free CPU available. We should
> still keep the 'other half' of SMT scheduling: the immediate pushing of
> tasks to a related core, but this bit of 'do not run tasks on this CPU'
> dependent-sleeper logic is i think a bit fragile. Plus these days SMT
> siblings do not tend to influence each other in such a negative way as
> older P4 ones where a HT sibling would slow down the other sibling
> significantly.

Well it is meant to be tuned to the cpu type in per_cpu_gain. So it should be
easy to be set to the appropriate scaling. It was never meant to be a one
value fits all as the processors changed.

> plus with an increasing number of siblings (which seems like an
> inevitable thing on the hardware side), the dependent-sleeper logic
> becomes less and less scalable. We'd have to cross-check every other
> 'related' CPU's current priority to decide what to run.

Yes even I've commented before that this current system is unworkable come
multiple shared power threads. This I do see as a real problem with it - in
the future.

> if then there should be a mechanism /in the hardware/ to set the
> priority of a CPU - and then the hardware could decide how to prioritize
> between siblings. Doing this in software is really hard.

And that's the depressing part because of course I was interested in that as
the original approach to the problem (and it was a big problem). When I spoke
to Intel and AMD (of course to date no SMT AMD chip exists) at kernel summit
they said it was too hard to implement hardware priorities well. Which is
real odd since IBM have already done it with Power...

Still I think it has been working fine in software till now, but now it has to
deal with the added confusion of dynticks, so I already know what will happen
to it.

Hrm it's been a good time for my code all round... I think I'll just swap
prefetch myself up the staircase to some pluggable scheduler that would
hyperthread me to sleep as an idle priority task.

--
-ck
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/