Re: [ckrm-tech] Re: [PATCH 1/3] CPUMETER: add cpumeter framework tothe CPUSETS

From: KUROSAWA Takahiro
Date: Wed Sep 28 2005 - 01:22:44 EST


On Tue, 27 Sep 2005 08:49:05 -0700
Paul Jackson <pj@xxxxxxx> wrote:

> Instead of doing the impossible and trying to mark all the children
> of C as meter_cpu all at same instant in time, I should just mark the
> parent cpuset C, one time.

I agree with your idea.
Marking the parent cpuset C instead of marking each child seems to make
things quite easier.

> Then if C is so marked, its -children- now
> have the meter_cpu_* files, which can default at that instant in time to
> providing (1/N) of the cpu to each of the N children of C.

Maybe the default value of meter_cpu_guar (guarantee) could be 0
in this design, since guarantee 0 means no guarantee. Resources
can even be allocated without guarantee.

> I would say that C can only be marked meter_cpu if:
> * C is already marked cpu_exclusive.
> * All of its children have the same 'cpus' setting as C.
> * Any new child of C created after C is marked meter_cpu will
> automatically start with the same 'cpus' setting as C, and
> with the meter_cpu_* files.
> * It is prohibited to change the 'cpus' of any cpuset whose
> parent is marked meter_cpu.
> * Changing the 'cpus' of a cpuset such as C that is itself
> marked meter_cpu will instantly change the 'cpus' of each
> of its children.
> * It is prohibited to turn on the cpu_exclusive flag of a cpuset
> whose parent is marked meter_cpu, or to turn off the cpu_exclusive
> flag on a cpuset that is itself marked meter_cpu.

These seem good for me.

> The metered children of C may have their own children in turn, which
> may have cpus any subset of the cpus in C, but which cannot be marked
> cpu_exclusive or meter_cpu.
>
> Borrowing your fine art work, and modifying it slightly, this looks
> like:
>
> +-----------------------------------+
> | |
> CPUSET 0 CPUSET 1 (aka 'C')
> sched domain A sched domain B
> cpus: 0, 1 cpus: 2, 3
> cpu_exclusive=1 cpu_exclusive=1
> meter_cpu=0 meter_cpu=1
> |
> +----------------+----------------+
> | | |
> CPUSET 1a CPUSET 1b CPUSET 1c
> cpus: 2, 3 cpus: 2, 3 cpus: 2, 3
> cpu_exclusive=0 cpu_exclusive=0 cpu_exclusive=0
> meter_cpu=0 meter_cpu=0 meter_cpu=0
> meter_cpu_* meter_cpu_* meter_cpu_*
> |
> +------------+------------+
> | |
> CPUSET 2a CPUSET 2b
> cpus: 2 cpus: 3
> meter_cpu=0 meter_cpu=0
> cpu_exclusive=0 cpu_exclusive=0
>
> Note here that marking C (CPUSET 1) as meter_cpu exposes the meter_cpu_*
> files in the children of C.

If we split the cpus {2, 3} into {2} and {3} by creating CPUSET 2a
and CPUSET 2b, the guarantee assigned to CPUSET 1a might not be
satisfied. For example, the maximum cpu resource usage of tasks
in CPUSET 2a should essentially be 50% because tasks in CPUSET 2a
can only use the half number of cpus. If we set meter_cpu_guar=60%
on CPUSET 1a, the problem might occur. Maybe (or maybe not), we can
leave this problem as it is.

> > Is it prohibited for any decendant of C's children to set meter_cpu=1 ?
>
> Yes, I presume so, and made up my new rules above assuming that. It is
> definitely worth an effort in my opinion to allow creating nested
> ordinary (not metered) cpusets below the children of C, but I am
> guessing it would be too hard to try to allow nesting of metered
> cpusets below metered cpusets. If you have a mind to try that however,
> I am more than willing to listen to your proposal.

Probably allowing nested metered cpusets is too hard both to implement
and to use. So far, I woundn't like to implement it.

> Therefore I propose some new cpuset flags:
> * 'sched_domain' to mark sched domains (now done by the cpu_exclusive
> flag),
> * 'kernel_memory' to mark the constraints on GFP_KERNEL allocations,
> * 'oom_killer' to mark the constraints on oom killing,
> * your 'meter_cpu' flag to mark a set of metered cpus, and
> * your 'meter_mem' flag to mark a set of metered mems.

This seems very interesting. Tasks enclosed by a certain cpuset may
want to change the behavior of oom_killer.

--
KUROSAWA, Takahiro
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/