Re: [ckrm-tech] [RFC] Resource Management - Infrastructure choices

From: David Rientjes
Date: Tue Oct 31 2006 - 23:40:42 EST

Next message: Mike Galbraith: "Re: 2.6.19-rc3-mm1 -- missing network adaptors"
Previous message: Herbert Xu: "Re: [PATCH] 2.6.19-rc4 - netlink messages created with bad flags in soft_irq context"
In reply to: Paul Menage: "Re: [ckrm-tech] [RFC] Resource Management - Infrastructure choices"
Next in thread: Pavel Emelianov: "Re: [ckrm-tech] [RFC] Resource Management - Infrastructure choices"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Mon, 30 Oct 2006, Paul Menage wrote:

> More or less. More concretely:
>
> - there is a single hierarchy of process containers
> - each process is a member of exactly one process container
>
> - for each resource controller, there's a hierarchy of resource "nodes"
> - each process container is associated with exactly one resource node
> of each type
>
> - by default, the process container hierarchy and the resource node
> hierarchies are isomorphic, but that can be controlled by userspace.
>

This approach appears to be the most complete and extensible
implementation of containers for all practical uses. Not only can you use
these process containers in conjunction with your choice of memory
controllers, network controllers, disk I/O controllers, etc, but you can
also pick and choose your own modular controller of choice to meet your
needs.

So here's our three process containers, A, B, and C, with our tasks m-t:

-----A----- -----B----- -----C-----
| | | | | | | |
m n o p q r s t

Here's our memory controller groups D and E and our containers set within
them:

-----D----- -----E-----
| | |
A B C

[ My memory controller E is for my real-time processes so I set its
attributes appropriately so that it never breaks. ]

And our network controller groups F, G, and H:

-----F----- -----G-----
| |
-----H----- C
| |
A B

[ I'm going to leave my network controller F open for my customer's
WWW browsing, but nobody is using it right now. ]

I choose not to control disk I/O so there is change from current behavior
for any of the processes listed above.

There's two things I notice about this approach (my use of the word
"container" refers to the process containers A, B, and C; my use of the
word "controller" refers to memory, disk I/O, network, etc controllers):

- While the process containers are only single-level, the controllers are
_inherently_ hierarchial just like a filesystem. So it appears that
the manipulation of these controllers would most effectively be done
from userspace with a filesystem approach. While it may not be served
by forcing CONFIG_CONFIGFS_FS to be enabled, I observe no objection to
giving it its own filesystem capability, apart from configfs, through
the kernel. The filesystem manipulation tools that everybody is
familiar with makes the implementation of controllers simple and, more
importantly, easier to _use_.

- The process containers will need to be setup as desired following
boot. So if the current approach of cpusets is used, where the
functionality is enabled on mount, all processes will originally belong
to the default container that encompasses the entire system. Since
each process must belong to only one process container as per Paul
Menage's proposal, a new container will need to be created and
processes _moved_ to it for later use by controllers. So it appears
that the manipulation of containers would most effectively be done from
userspace by a syscall approach.

In this scenario, it is not necessary for network controller groups F and
G above to be limited (or guaranteed) to 100% of our network load. It is
quite possible that we do not assign every container to a network
controller so that they receive the remainder of the bandwidth that is not
already attributed to F and G. The same is true with any controller. Our
controllers should only seek the limit or guarantee certain amount of
resources, not force each system process to be a member of one group or
another to receive the resources.

Two questions also arise:

- Why do I need to create (i.e. mount the filesystem) the container in
the first place? Since the use of these containers are entirely on the
shoulders of the optional controllers, there should be no interference
with current behavior if I choose not to use any controller. So why
not take the approach that NUMA did whereas if we're on an UMA machine,
all of memory belongs to a node 0? In our case, all processes will
inherently belong to a system-wide container similar to procfs. In
fact, procfs is how this can be implemented apart from configfs
following the criticism from UBC.

- How is forking handled with the various controllers? Do child
processes automatically inherit all the controller groups of its
parent? If not (or if its dependant on a user-configured attribute
of the controller), what happens when I want forked processes to
belong to a new network controller group from container A in the
illustration above? Certaintly that new controller cannot be
created as a sibling of F and G; and determining the limit on
network for a third child of H would be non-trivial because then
the network resources allocated to A and B would be scaled back
prehaps in an undesired manner.

So the container abstraction looks appropriate for a syscall interface
whereas a controller abstraction looks appropriate for a filesystem
interface. If Paul Menage's proposal of above is adopted, it seems like
the design and implementation of containers is the first milestone; how
far does the current patchset get us to what is described above? Does it
still support a hierarchy just like cpusets?

And following that, it seems like the next milestone would be to design
the different characteristics that the various modular controllers could
support such as notify_on_release, limits/guarantees, behavior on fork,
and privileges.

David
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Mike Galbraith: "Re: 2.6.19-rc3-mm1 -- missing network adaptors"
Previous message: Herbert Xu: "Re: [PATCH] 2.6.19-rc4 - netlink messages created with bad flags in soft_irq context"
In reply to: Paul Menage: "Re: [ckrm-tech] [RFC] Resource Management - Infrastructure choices"
Next in thread: Pavel Emelianov: "Re: [ckrm-tech] [RFC] Resource Management - Infrastructure choices"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]