Re: [PATCH] mm, oom: Fix race when selecting process to kill

From: David Rientjes
Date: Tue Nov 05 2013 - 20:19:09 EST


On Tue, 5 Nov 2013, Sameer Nanda wrote:

> The selection of the process to be killed happens in two spots -- first
> in select_bad_process and then a further refinement by looking for
> child processes in oom_kill_process. Since this is a two step process,
> it is possible that the process selected by select_bad_process may get a
> SIGKILL just before oom_kill_process executes. If this were to happen,
> __unhash_process deletes this process from the thread_group list. This
> then results in oom_kill_process getting stuck in an infinite loop when
> traversing the thread_group list of the selected process.
>
> Fix this race by holding the tasklist_lock across the calls to both
> select_bad_process and oom_kill_process.
>
> Change-Id: I8f96b106b3257b5c103d6497bac7f04f4dff4e60
> Signed-off-by: Sameer Nanda <snanda@xxxxxxxxxxxx>

Nack, we had to avoid taking tasklist_lock for this duration since it
stalls out forks and exits on other cpus trying to take the writeside with
irqs disabled to avoid watchdog problems.

What kernel version are you patching? If you check the latest Linus tree,
we hold a reference to the task_struct of the chosen process before
calling oom_kill_process() so the hypothesis would seem incorrect.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/