Re: [patch -mmotm] mm: introduce oom_adj_child

From: Paul Menage
Date: Mon Jul 27 2009 - 20:30:22 EST


On Mon, Jul 27, 2009 at 5:10 PM, David Rientjes<rientjes@xxxxxxxxxx> wrote:
> On Mon, 27 Jul 2009, Paul Menage wrote:
>
>> On Sun, Jul 26, 2009 at 2:50 PM, David Rientjes<rientjes@xxxxxxxxxx> wrote:
>> > +If oom_adj_child is set to equal oom_adj, then it will mirror oom_adj whenever
>> > +it changes.  This avoids having to set both values when simply tuning oom_adj
>> > +and that value should be inherited by all children.
>>
>> Maybe have a distinct value for oom_adj_child (the default) that means
>> "default to mm->oom_adj" ?
>>
>
> That's implicitly what mm->oom_adj == mm->oom_adj_child means.  If they
> are equal at the time oom_adj is changed, oom_adj_child also changes, but
> if oom_adj_child differs then it remains static.

So a process that sets its oom_adj value from A to B and back A again
might unintentionally synchronize oom_adj and oom_adj_child for the
future, if oom_adj_child was originally set to B?

Besides, if oom_adj_child is per-task as suggested below, changing
oom_adj_child when oom_adj changes involves scanning the entire task
list to find mm users.

>
>> Shouldn't oom_adj_child be per-task? Otherwise you're theoretically
>> allowing races between different threads that try to fork children
>> with different oom_adj values at the same time. Not a particularly
>> likely problem, but it seems bad to bake the change of races into the
>> API.
>>
>
> Good point, the newly initialized mm can get its oom_adj value from
> current rather than current->mm.
>
>> Also, I'm not sure that the requirement that oom_adj_child be >=
>> oom_adj is a good restriction. Sure, if a task gives its child a lower
>> oom_adj than itself it's potentially playing with fire, but it may
>> well be that the new child is expected todaemonize itself in the very
>> near future and hence no longer be the child of the current process. I
>> don't think that restricting the values that the sysadmin or root
>> processes can apply on the grounds that they might not do what they
>> want is the right approach.
>>
>
> Ok, we can allow oom_adj_child to be less than oom_adj for
> CAP_SYS_RESOURCE.

Sounds fine to me, since you already need CAP_SYS_RESOURCE to set
oom_adj anyway. But actually, shouldn't you just be requiring
CAP_SYS_RESOURCE to set oom_adj_child at all?

Otherwise an unprivileged process that starts with oom_adj=0 could set
its oom_adj_child value to something slightly less immune than its
oom_adj, say 1; then even if the sysadmin sets if oom_adj value to
very non-immune, it would still be able to create children with
oom_adj 1.

Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/