Re: + oom-pm-oom-killed-task-cannot-escape-pm-suspend.patch added to -mm tree

From: Michal Hocko
Date: Mon Oct 20 2014 - 14:47:12 EST


On Fri 17-10-14 19:19:04, Oleg Nesterov wrote:
> Michal, I am not really arguing with this patch, but since you are going
> (iiuc) to resend it anyway let me ask a couple of questions.
>
> > This, however, still keeps
> > a window open when a killed task didn't manage to die by the time
> > freeze_processes finishes.
>
> Sure,
>
> > Fix this race by checking all tasks after OOM killer has been disabled.
>
> But this doesn't close the race entirely? please see below.
>
> > int freeze_processes(void)
> > {
> > int error;
> > + int oom_kills_saved;
> >
> > error = __usermodehelper_disable(UMH_FREEZING);
> > if (error)
> > @@ -132,12 +133,40 @@ int freeze_processes(void)
> > pm_wakeup_clear();
> > printk("Freezing user space processes ... ");
> > pm_freezing = true;
> > + oom_kills_saved = oom_kills_count();
> > error = try_to_freeze_tasks(true);
> > if (!error) {
> > - printk("done.");
> > __usermodehelper_set_disable_depth(UMH_DISABLED);
> > oom_killer_disable();
> > +
> > + /*
> > + * There was a OOM kill while we were freezing tasks
> > + * and the killed task might be still on the way out
> > + * so we have to double check for race.
> > + */
> > + if (oom_kills_count() != oom_kills_saved) {
>
> OK, I agree, this makes the things better, but perhaps we should document
> (at least in the changelog) that this is still racy. oom_killer_disable()
> obviously can stop the already called out_of_memory(), it can kill a frozen

I guess you meant "can't stop the already called..."

> task right after this check or even after the loop before.

You are right. The race window is still there. I've considered all tasks
being frozen already as sufficient but kernel threads and workqueue
items may allocate memory while we are freezing tasks and trigger OOM
killer as well. This will be inherently racy unless we use a locking
between freezer and OOM killer - which sounds too heavy to me. I can
reduce the race window by noting an OOM much earlier (when the allocator
enters OOM last round before OOM killer fires). The question is whether
this is sufficient because it is a half solution too.

> > + struct task_struct *g, *p;
> > +
> > + read_lock(&tasklist_lock);
> > + do_each_thread(g, p) {
> > + if (p == current || freezer_should_skip(p) ||
> > + frozen(p))
> > + continue;
> > + error = -EBUSY;
> > + break;
> > + } while_each_thread(g, p);
>
> Please use for_each_process_thread(), do/while_each_thread is deprecated.

Sure, I was mimicking try_to_freeze_tasks which still uses the old
interface. Will send a patch which will use the new macro.

> > +/*
> > + * Number of OOM killer invocations (including memcg OOM killer).
> > + * Primarily used by PM freezer to check for potential races with
> > + * OOM killed frozen task.
> > + */
> > +static atomic_t oom_kills = ATOMIC_INIT(0);
> > +
> > +int oom_kills_count(void)
> > +{
> > + return atomic_read(&oom_kills);
> > +}
> > +
> > #define K(x) ((x) << (PAGE_SHIFT-10))
> > /*
> > * Must be called while holding a reference to p, which will be released upon
> > @@ -504,11 +516,13 @@ void oom_kill_process(struct task_struct
> > pr_err("Kill process %d (%s) sharing same memory\n",
> > task_pid_nr(p), p->comm);
> > task_unlock(p);
> > + atomic_inc(&oom_kills);
>
> Do we really need this? Can't freeze_processes() (ab)use oom_notify_list?

I would really prefer not using oom_notify_list. It is just an ugly
interface.

> Yes, we can have more false positives this way, but probably this doesn't
> matter? This is unlikely case anyway.

Yeah false positives are not a big deal.

I cannot say I would be happy about the following because it doesn't
close the race window completely but it is very well possible that
closing it completely would require much bigger changes and maybe this
is sufficient for now?
---