Re: [PATCH 6/5] oom, oom_reaper: disable oom_reaper for oom_kill_allocating_task

From: Michal Hocko
Date: Mon Feb 22 2016 - 04:41:15 EST


On Sat 20-02-16 11:32:07, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > On Wed 17-02-16 10:48:55, Michal Hocko wrote:
> > > Hi Andrew,
> > > although this can be folded into patch 5
> > > (mm-oom_reaper-implement-oom-victims-queuing.patch) I think it would be
> > > better to have it separate and revert after we sort out the proper
> > > oom_kill_allocating_task behavior or handle exclusion at oom_reaper
> > > level.
> >
> > An alternative would be something like the following. It is definitely
> > less hackish but it steals one bit in mm->flags. We do not seem to be
> > in shortage there now but who knows. Does this sound better? Later
> > changes might even consider the flag for the victim selection and ignore
> > those which already have the flag set. But I didn't think about it more
> > to form a patch yet.
>
> This sounds better than "can_oom_reap = !sysctl_oom_kill_allocating_task;".
>
> > @@ -740,6 +740,10 @@ void oom_kill_process(struct oom_control *oc, struct task_struct *p,
> > /* Get a reference to safely compare mm after task_unlock(victim) */
> > mm = victim->mm;
> > atomic_inc(&mm->mm_count);
> > +
> > + /* Make sure we do not try to oom reap the mm multiple times */
> > + can_oom_reap = !test_and_set_bit(MMF_OOM_KILLED, &mm->flags);
> > +
> > /*
> > * We should send SIGKILL before setting TIF_MEMDIE in order to prevent
> > * the OOM victim from depleting the memory reserves from the user
>
> But as of this line we don't know whether this mm is reapable.

Which is not really important. We know that it is eligible only if the
mm wasn't a part of the OOM kill before. Later checks are, of course,
allowed to veto the default and disable the oom reaper.

> Shouldn't this be done like
>
> static void wake_oom_reaper(struct task_struct *tsk)
> {
> /* Make sure we do not try to oom reap the mm multiple times */
> if (!oom_reaper_th || !test_and_set_bit(MMF_OOM_KILLED, &mm->flags))
> return;

We do not have the mm here. We have a task and would need the task_lock.
I find it much easier to evaluate mm while we still have it and we know
the task holding this mm will receive SIGKILL and TIF_MEMDIE.

> get_task_struct(tsk);
>
> spin_lock(&oom_reaper_lock);
> list_add(&tsk->oom_reaper_list, &oom_reaper_list);
> spin_unlock(&oom_reaper_lock);
> wake_up(&oom_reaper_wait);
> }
>
> ?
>
> Moreover, why don't you do like
>
> struct mm_struct {
> (...snipped...)
> struct list_head oom_reaper_list;
> (...snipped...)
> }

Because we would need to search all tasks sharing the same mm in order
to exit_oom_victim.

> than
>
> struct task_struct {
> (...snipped...)
> struct list_head oom_reaper_list;
> (...snipped...)
> }
>
> so that we can update all ->oom_score_adj using this mm_struct for handling
> crazy combo ( http://lkml.kernel.org/r/20160204163113.GF14425@xxxxxxxxxxxxxx ) ?

I find it much easier to to simply skip over tasks with MMF_OOM_KILLED
when already selecting a victim. We won't need oom_score_adj games at
all. This needs a deeper evaluation though. I didn't get to it yet,
but the point of having MMF flag which is not oom_reaper specific
was to have it reusable in other contexts as well.

Thanks!
--
Michal Hocko
SUSE Labs