Re: [RFC PATCH] scheduler: Dynamic sched_domains

From: Nick Piggin
Date: Wed Oct 06 2004 - 21:17:09 EST


Matthew Dobson wrote:
This code is in no way complete. But since I brought it up in the
"cpusets - big numa cpu and memory placement" thread, I figure the code
needs to be posted.

The basic idea is as follows:

1) Rip out sched_groups and move them into the sched_domains.
2) Add some reference counting, and eventually locking, to
sched_domains.
3) Rewrite & simplify the way sched_domains are built and linked into a
cohesive tree.


OK. I'm not sure that I like the direction, but... (I haven't looked
too closely at it).

This should allow us to support hotplug more easily, simply removing the
domain belonging to the going-away CPU, rather than throwing away the
whole domain tree and rebuilding from scratch.

Although what we have in -mm now should support CPU hotplug just fine.
The hotplug guys really seem not to care how disruptive a hotplug
operation is.

This should also allow
us to support multiple, independent (ie: no shared root) domain trees
which will facilitate isolated CPU groups and exclusive domains. I also

Hmm, what was my word for them... yeah, disjoint. We can do that now,
see isolcpus= for a subset of the functionality you want (doing larger
exclusive sets would probably just require we run the setup code once
for each exclusive set we want to build).

hope this will allow us to leverage the existing topology infrastructure
to build domains that closely resemble the physical structure of the
machine automagically, thus making supporting interesting NUMA machines
and SMT machines easier.

This patch is just a snapshot in the middle of development, so there are
certainly some uglies & bugs that will get fixed. That said, any
comments about the general design are strongly encouraged. Heck, any
feedback at all is welcome! :)

Patch against 2.6.9-rc3-mm2.

This is what I did in my first (that nobody ever saw) implementation of
sched domains. Ie. no sched_groups, just use sched_domains as the balancing
object... I'm not sure this works too well.

For example, your bottom level domain is going to basically be a redundant,
single CPU on most topologies, isn't it?

Also, how will you do overlapping domains that SGI want to do (see
arch/ia64/kernel/domain.c in -mm kernels)?

node2 wants to balance between node0, node1, itself, node3, node4.
node4 wants to balance between node2, node3, itself, node5, node6.
etc.

I think your lists will get tangled, no?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/