Re: [RFC] CFQ group scheduling structure organization

From: Corrado Zoccolo
Date: Thu Dec 17 2009 - 06:41:39 EST


Hi,
On Wed, Dec 16, 2009 at 11:52 PM, Vivek Goyal <vgoyal@xxxxxxxxxx> wrote:
> Hi All,
>
> With some basic group scheduling support in CFQ, there are few questions
> regarding how group structure should look like in CFQ.
>
> Currently, grouping looks as follows. A, and B are two cgroups created by
> user.
>
> [snip]
>
> Proposal 4:
> ==========
> Treat task and group at same level. Currently groups are at top level and
> at second level are tasks. View the whole hierarchy as follows.
>
>
> Â Â Â Â Â Â Â Â Â Â Â Âservice-tree
> Â Â Â Â Â Â Â Â Â Â Â Â/ Â | Â\ Â\
> Â Â Â Â Â Â Â Â Â Â Â T1 Â T2 ÂG1 G2
>
> Here T1 and T2 are two tasks in root group and G1 and G2 are two cgroups
> created under root.
>
> In this kind of scheme, any RT task in root group will still be system
> wide RT even if we create groups G1 and G2.
>
> So what are the issues?
>
> - I talked to few folks and everybody found this scheme not so intutive.
> ÂTheir argument was that once I create a cgroup, say A, Âunder root, then
> Âbandwidth should be divided between "root" and "A" proportionate to
> Âthe weight.
>
> ÂIt is not very intutive that group is competing with all the tasks
> Ârunning in root group. And disk share of newly created group will change
> Âif more tasks fork in root group. So it is highly dynamic and not
> Âstatic hence un-intutive.
>
> ÂTo emulate the behavior of previous proposals, root shall have to create
> Âa new group and move all root tasks there. But admin shall have to still
> Âkeep RT tasks in root group so that they still remain system-wide.
>
> Â Â Â Â Â Â Â Â Â Â Â Âservice-tree
> Â Â Â Â Â Â Â Â Â Â Â Â/ Â | Â Â\ Â\
> Â Â Â Â Â Â Â Â Â Â Â T1 Âroot ÂG1 G2
> Â Â Â Â Â Â Â Â Â Â Â Â Â Â|
> Â Â Â Â Â Â Â Â Â Â Â Â Â ÂT2
>
> ÂNow admin has specifically created a group "root" along side G1 and G2
> Âand moved T2 under root. T1 is still left in top level group as it might
> Âbe an RT task and we want it to remain RT task systemwide.
>
> ÂSo to some people this scheme is un-intutive and requires more work in
> Âuser space to achive desired behavior. I am kind of 50:50 between two
> Âkind of arrangements.
>
This is the one I prefer: it is the most natural one if you see that
groups are scheduling entities like any other task.
I think it becomes intuitive with an analogy with a qemu (e.g. kvm)
virtual machine model. If you think a group like a virtual machine, it
is clear that for the normal system, the whole virtual machine is a
single scheduling entity, and that it has to compete with other
virtual machines (as other single entities) and every process in the
real system (those are inherently more important, since without the
real system, the VMs cannot simply exist).
Having a designated root group, instead, resembles the xen VM model,
where you have a separated domain for each VM and for the real system.

I think the implementation of this approach can make the code simpler
and modular (CFQ could be abstracted to deal with scheduling entities,
and each scheduling entity could be defined in a separate file).
Within each group, you will now have the choice of how to schedule its
queues. This means that you could possibly have different I/O
schedulers within each group, and even have sub-groups within groups.
>
> I am looking for some feedback on what makes most sense.
I think that regardless of our preference, we should coordinate with
how the CPU scheduler works, since I think the users will be more
surprised to see cgroups behaving different w.r.t. CPU and disk, than
if the RT task behaviour changes when cgroups are introduced.

Thanks,
Corrado

>
> For the time being, I am little inclined towards proposal 2 and I have
> implemented a proof of concept version on top of for-2.6.33 branch in block
> tree. ÂThese patches are compile and boot tested only and I have yet to do
> testing.
>
> Thanks
> Vivek
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/