Re: [PATCH 2/3] oom, oom_reaper: Try to reap tasks which skip regular OOM killer path

From: Michal Hocko
Date: Fri Apr 08 2016 - 07:50:45 EST


On Fri 08-04-16 20:19:28, Tetsuo Handa wrote:
> Tetsuo Handa wrote:
> > Michal Hocko wrote:
> > > @@ -694,6 +746,7 @@ void oom_kill_process(struct oom_control *oc, struct task_struct *p,
> > > task_lock(p);
> > > if (p->mm && task_will_free_mem(p)) {
> > > mark_oom_victim(p);
> > > + try_oom_reaper(p);
> > > task_unlock(p);
> > > put_task_struct(p);
> > > return;
> > > @@ -873,6 +926,7 @@ bool out_of_memory(struct oom_control *oc)
> > > if (current->mm &&
> > > (fatal_signal_pending(current) || task_will_free_mem(current))) {
> > > mark_oom_victim(current);
> > > + try_oom_reaper(current);
> > > return true;
> > > }
> > >
>
> oom_reaper() will need to do "tsk->oom_reaper_list = NULL;" due to
>
> if (tsk == oom_reaper_list || tsk->oom_reaper_list)
> return;
>
> test in wake_oom_reaper() if "[PATCH 3/3] mm, oom_reaper: clear
> TIF_MEMDIE for all tasks queued for oom_reaper" will select the same
> thread again.

true, will update my patch.

> Though I think we should not allow the OOM killer to select the same
> thread again.
>
> >
> > Why don't you call try_oom_reaper() from the shortcuts in
> > mem_cgroup_out_of_memory() as well?
>
> I looked at next-20160408 but I again came to think that we should remove
> these shortcuts (something like a patch shown bottom).

feel free to send the patch with the full description. But I would
really encourage you to check the history to learn why those have been
added and describe why those concerns are not valid/important anymore.
Your way of throwing a large patch based on an extreme load which is
basically DoSing the machine is not the ideal one.

I do respect your different opinion. It is well possible that you are
right here and you can convince all the reviewers that your changes
are safe. I would be more than happy to drop my smaller steps approach
then. But I will be honest with you, you haven't convinced me yet and
I have seen so many subtle issues in this code area that the risk is
really non trivial for any larger changes. This is the primary reason I
am doing small steps each focusing on a single improvement which can be
argued about and is known to help a particular case without introducing
a risk of different problems. I am not the maintainer so it is not up to
me to select the right approach.
--
Michal Hocko
SUSE Labs