Re: [PATCH for 3.2.34] memcg: do not trigger OOM if PF_NO_MEMCG_OOMis set

From: Michal Hocko
Date: Thu Jun 06 2013 - 12:05:08 EST


Hi,

I am really sorry it took so long but I was constantly preempted by
other stuff. I hope I have a good news for you, though. Johannes has
found a nice way how to overcome deadlock issues from memcg OOM which
might help you. Would you be willing to test with his patch
(http://permalink.gmane.org/gmane.linux.kernel.mm/101437). Unlike my
patch which handles just the i_mutex case his patch solved all possible
locks.

I can backport the patch for your kernel (are you still using 3.2 kernel
or you have moved to a newer one?).

On Fri 22-02-13 09:23:32, azurIt wrote:
> >Unfortunately I am not able to reproduce this behavior even if I try
> >to hammer OOM like mad so I am afraid I cannot help you much without
> >further debugging patches.
> >I do realize that experimenting in your environment is a problem but I
> >do not many options left. Please do not use strace and rather collect
> >/proc/pid/stack instead. It would be also helpful to get group/tasks
> >file to have a full list of tasks in the group
>
>
>
> Hi Michal,
>
>
> sorry that i didn't response for a while. Today i installed kernel with your two patches and i'm running it now. I'm still having problems with OOM which is not able to handle low memory and is not killing processes. Here is some info:
>
> - data from cgroup 1258 while it was under OOM and no processes were killed (so OOM don't stop and cgroup was freezed)
> http://watchdog.sk/lkml/memcg-bug-6.tar.gz
>
> I noticed problem about on 8:39 and waited until 8:57 (nothing happend). Then i killed process 19864 which seems to help and other processes probably ends and cgroup started to work. But problem accoured again about 20 seconds later, so i killed all processes at 8:58. The problem is occuring all the time since then. All processes (in that cgroup) are always in state 'D' when it occurs.
>
>
> - kernel log from boot until now
> http://watchdog.sk/lkml/kern3.gz
>
>
> Btw, something probably happened also at about 3:09 but i wasn't able to gather any data because my 'load check script' killed all apache processes (load was more than 100).
>
>
>
> azur
> --
> To unsubscribe from this list: send the line "unsubscribe cgroups" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at http://vger.kernel.org/majordomo-info.html

--
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/