In message <3FD9679A.1020404@xxxxxxxxxxxxxxx> you write:
Thanks for having a look Rusty. I'll try to convince you :)
As you know, the domain classes is not just for HT, but can do multi levels
of NUMA, and it can be built by architecture specific code which is good
for Opteron, for example. It doesn't need CONFIG_SCHED_SMT either, of course,
or CONFIG_NUMA even: degenerate domains can just be collapsed (code isn't
there to do that now).
Yes, but this isn't what we really want. I'm actually accusing you of
lacking ambition 8)
Shared runqueues I find isn't so flexible. I think it perfectly describes
the P4 HT architecture, but what happens if (when) siblings get seperate
L1 caches? What about SMT, CMP, SMP and NUMA levels in the POWER5?
It describes every HyperThread implementation I am aware of today, so
it suits us fine for the moment. Runqueues may still be worth sharing
even if L1 isn't, for example.
The large SGI (and I imagine IBM's POWER5s) systems need things like
progressive balancing backoff and would probably benefit with a more
heirachical balancing scheme so all the balancing operations don't kill
the system.
But this is my point. Scheduling is one part of the problem. I want
to be able to have the arch-specific code feed in a description of
memory and cpu distances, bandwidths and whatever, and have the
scheduler, slab allocator, per-cpu data allocation, page cache, page
migrator and anything else which cares adjust itself based on that.
Power 4 today has pairs of CPUs on a die, four dies on a board, and
four boards in a machine. I want one infrastructure to descibe it,
not have to do program every infrastructure from arch-specific code.
w26 does ALL this, while sched.o is 3K smaller than Ingo's shared runqueue
patch on NUMA and SMP, and 1K smaller on UP (although sched.c is 90 lines
longer). kernbench system time is down nearly 10% on the NUMAQ, so it isn't
hurting performance either.
Agreed, but Ingo's shared runqueue patch is poor implementation of a
good idea: I've always disliked it. I'm halfway through updating my
patch, and I really think you'll like it better. It's not
incompatible with NUMA changes, in fact it's fairly non-invasive.