Re: [PATCH 4/6] mm, oom: skip over vforked tasks

From: Michal Hocko
Date: Mon May 30 2016 - 06:54:04 EST


On Mon 30-05-16 13:40:17, Vladimir Davydov wrote:
> On Mon, May 30, 2016 at 11:52:12AM +0200, Michal Hocko wrote:
> > On Mon 30-05-16 09:13:57, Michal Hocko wrote:
> > > On Fri 27-05-16 19:48:30, Vladimir Davydov wrote:
> > > > On Thu, May 26, 2016 at 02:40:13PM +0200, Michal Hocko wrote:
> > > [...]
> > > > > @@ -839,6 +841,13 @@ void oom_kill_process(struct oom_control *oc, struct task_struct *p,
> > > > > for_each_process(p) {
> > > > > if (!process_shares_mm(p, mm))
> > > > > continue;
> > > > > + /*
> > > > > + * vforked tasks are ignored because they will drop the mm soon
> > > > > + * hopefully and even if not they will not mind being oom
> > > > > + * reaped because they cannot touch any memory.
> > > >
> > > > They shouldn't modify memory, but they still can touch it AFAIK.
> > >
> > > You are right. This means that the vforked child might see zero pages.
> > > Let me think whether this is acceptable or not.
> >
> > OK, I was thinking about it some more and I think you have a good point
> > here. I can see two options here:
> > - keep vforked task alive and skip the oom reaper. If the victim exits
> > normally and the oom wouldn't get resolved the vforked task will be
> > selected in the next round because the victim would clean up
> > vfork_done state in wait_for_vfork_done. We are still risking that
> > the victim gets stuck though
> > - kill vforked task and so it would be reapable.
>
> IMHO it all depends on what we're trying to achieve. If we want per task
> oom, which could make some sense since a task can consume a lot of mem
> via e.g. pipe buffers, we would go with option #1. However, it's rather
> difficult to find out how much of kmem a task consumes w/o using kmemcg,
> so IMHO per-mm approach makes more sense in general. In this case I
> think we should kill both vforked task and its parent if their mm was
> selected provided their oom_score_adj allows that.

Yes agreed. Going with per-mm is a safier behavior because the vast
majority of the consumed memory should be per mm not per task_struct.

--
Michal Hocko
SUSE Labs