Re: [RFC PATCH] mm/oom_kill: allow oom kill allocating task for non-global case

From: Michal Hocko
Date: Tue Jun 08 2021 - 09:58:21 EST


On Tue 08-06-21 11:00:22, Aaron Tomlin wrote:
> On Tue 2021-06-08 08:22 +0200, Michal Hocko wrote:
> > OK. A full report (including the backtrace) would tell us more what is
> > the source of the charge. I thought that most #PF charging paths use the
> > same gfp mask as the allocation (which would include other flags on top
> > of GFP_KERNEL) but it seems we just use GFP_KERNEL at many places.
>
> The following is what I can provide for now:
>
Let me add what we have from previous email

> [ 8221.433608] memory: usage 21280kB, limit 204800kB, failcnt 49116
>   :
> [ 8227.239769] [ pid ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
> [ 8227.242495] [1611298]     0 1611298    35869      635 167936        0         -1000 conmon
> [ 8227.242518] [1702509]     0 1702509    35869      701 176128        0         -1000 conmon
> [ 8227.242522] [1703345] 1001050000 1703294   183440        0 2125824        0           999 node
> [ 8227.242706] Out of memory and no killable processes...

I do not see this message to be ever printed on 4.18 for memcg oom:
/* Found nothing?!?! Either we hang forever, or we panic. */
if (!oc->chosen && !is_sysrq_oom(oc) && !is_memcg_oom(oc)) {
dump_header(oc, NULL);
panic("Out of memory and no killable processes...\n");
}

So how come it got triggered here? Is it possible that there is a global
oom killer somehow going on along with the memcg OOM? Because the below
stack clearly points to a memcg OOM and a new one AFAICS.

That being said, a full chain of oom events would be definitely useful
to get a better idea.

> [ 8227.242731] node invoked oom-killer: gfp_mask=0x6000c0(GFP_KERNEL), nodemask=(null), order=0, oom_score_adj=999
> [ 8227.242732] node cpuset=XXXX mems_allowed=0-1
> [ 8227.242736] CPU: 12 PID: 1703347 Comm: node Kdump: loaded Not tainted 4.18.0-193.51.1.el8_2.x86_64 #1
> [ 8227.242737] Hardware name: XXXX
> [ 8227.242738] Call Trace:
> [ 8227.242746] dump_stack+0x5c/0x80
> [ 8227.242751] dump_header+0x6e/0x27a
> [ 8227.242753] out_of_memory.cold.31+0x39/0x8d
> [ 8227.242756] mem_cgroup_out_of_memory+0x49/0x80
> [ 8227.242758] try_charge+0x58c/0x780
> [ 8227.242761] ? __alloc_pages_nodemask+0xef/0x280
> [ 8227.242763] mem_cgroup_try_charge+0x8b/0x1a0
> [ 8227.242764] mem_cgroup_try_charge_delay+0x1c/0x40
> [ 8227.242767] do_anonymous_page+0xb5/0x360
> [ 8227.242770] ? __switch_to_asm+0x35/0x70
> [ 8227.242772] __handle_mm_fault+0x662/0x6a0
> [ 8227.242774] handle_mm_fault+0xda/0x200
> [ 8227.242778] __do_page_fault+0x22d/0x4e0
> [ 8227.242780] do_page_fault+0x32/0x110
> [ 8227.242782] ? page_fault+0x8/0x30
> [ 8227.242783] page_fault+0x1e/0x30
>
> --
> Aaron Tomlin

--
Michal Hocko
SUSE Labs