Re: [PATCH V3 3/3] mm: page_alloc: drain pcp lists before oom kill

From: Zach O'Keefe
Date: Fri Jan 26 2024 - 17:52:16 EST


Hey Michal,

> Do you have any example OOM reports? [..]

Sure, here is one on a 1TiB, 128-physical core machine running a
5.10-based kernel (sorry, it reads pretty awkwardly when wrapped):

---8<---
mytask invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE),
order=0, oom_score_adj=0
<...>
oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=sdc,mems_allowed=0-1,global_oom,task_memcg=/sdc,task=mytask,pid=835214,uid=0
Out of memory: Killed process 835214 (mytask) total-vm:787716604kB,
anon-rss:787536152kB, file-rss:64kB, shmem-rss:0kB, UID:0
pgtables:1541224kB oom_score_adj:0, hugetlb-usage:0kB
Mem-Info:
active_anon:320 inactive_anon:198083493 isolated_anon:0
active_file:128283 inactive_file:290086 isolated_file:0
unevictable:3525 dirty:15 writeback:0
slab_reclaimable:35505 slab_unreclaimable:272917
mapped:46414 shmem:822 pagetables:64085088
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:325793 free_pcp:263277 free_cma:0
Node 0 active_anon:1112kB inactive_anon:268172556kB
active_file:270992kB inactive_file:254612kB unevictable:12404kB
isolated(anon):0kB isolated(file):0kB mapped:147240kB dirty:52kB
writeback:0kB shmem:304kB shmem_thp:0kB shmem_pmdmapped:0kB
anon_thp:1310720kB writeback_tmp:0kB kernel_stack:32000kB
pagetables:255483108kB sec_pagetables:0kB all_unreclaimable? yes
Node 1 active_anon:168kB inactive_anon:524161416kB
active_file:242140kB inactive_file:905732kB unevictable:1696kB
isolated(anon):0kB isolated(file):0kB mapped:38416kB dirty:8kB
writeback:0kB shmem:2984kB shmem_thp:0kB shmem_pmdmapped:0kB
anon_thp:267732992kB writeback_tmp:0kB kernel_stack:8520kB
pagetables:857244kB sec_pagetables:0kB all_unreclaimable? yes
Node 0 Crash free:72kB min:108kB low:220kB high:332kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:111940kB
active_file:280kB inactive_file:316kB unevictable:0kB writepending:4kB
present:114284kB managed:114196kB mlocked:0kB bounce:0kB
free_pcp:1528kB local_pcp:24kB free_cma:0kB
lowmem_reserve[]: 0 0 0 0
Node 0 DMA32 free:66592kB min:2580kB low:5220kB high:7860kB
reserved_highatomic:0KB active_anon:8kB inactive_anon:19456kB
active_file:4kB inactive_file:224kB unevictable:0kB writepending:0kB
present:2643512kB managed:2643512kB mlocked:0kB bounce:0kB
free_pcp:8040kB local_pcp:244kB free_cma:0kB
lowmem_reserve[]: 0 0 16029 16029
Node 0 Normal free:513048kB min:513192kB low:1038700kB high:1564208kB
reserved_highatomic:0KB active_anon:1104kB inactive_anon:268040520kB
active_file:270708kB inactive_file:254072kB unevictable:12404kB
writepending:48kB present:533969920kB managed:525510968kB
mlocked:12344kB bounce:0kB free_pcp:790040kB local_pcp:7060kB
free_cma:0kB
lowmem_reserve[]: 0 0 0 0
Node 1 Normal free:723460kB min:755656kB low:1284080kB high:1812504kB
reserved_highatomic:0KB active_anon:168kB inactive_anon:524161416kB
active_file:242140kB inactive_file:905732kB unevictable:1696kB
writepending:8kB present:536866816kB managed:528427664kB
mlocked:1588kB bounce:0kB free_pcp:253500kB local_pcp:12kB
free_cma:0kB
lowmem_reserve[]: 0 0 0 0
Node 0 Crash: 0*4kB 0*8kB 1*16kB (M) 0*32kB 0*64kB 0*128kB 0*256kB
0*512kB 0*1024kB 0*2048kB 0*4096kB = 16kB
Node 0 DMA32: 80*4kB (UME) 74*8kB (UE) 23*16kB (UME) 21*32kB (UME)
40*64kB (UE) 35*128kB (UME) 3*256kB (UE) 9*512kB (UME) 13*1024kB (UM)
19*2048kB (UME) 0*4096kB = 66592kB
Node 0 Normal: 1999*4kB (UE) 259*8kB (UM) 465*16kB (UM) 114*32kB (UE)
54*64kB (UME) 14*128kB (U) 74*256kB (UME) 128*512kB (UE) 96*1024kB (U)
56*2048kB (U) 46*4096kB (U) = 512292kB
Node 1 Normal: 2280*4kB (UM) 12667*8kB (UM) 8859*16kB (UME) 5221*32kB
(UME) 1631*64kB (UME) 899*128kB (UM) 330*256kB (UME) 0*512kB 0*1024kB
0*2048kB 0*4096kB = 723208kB
Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0
hugepages_size=1048576kB
Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0
hugepages_size=2048kB
Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0
hugepages_size=1048576kB
Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0
hugepages_size=2048kB
420675 total pagecache pages
0 pages in swap cache
Swap cache stats: add 0, delete 0, find 0/0
Free swap = 268435456kB
Total swap = 268435456kB
---8<---

Node 0/1 Normal free memory is below respective min watermarks, with
790040kB+253500kB ~= 1GiB of memory on pcp lists.

With this patch, the GFP_HIGHUSER_MOVABLE + unrestricted mems_allowed
allocation would have allowed us to access all that memory, very
likely avoiding the oom.

> [..] There were recent changes to scale
> the pcp pages and it would be good to know whether they work reasonably
> well even under memory pressure.

I'm not familiar with these changes, but a quick check of recent
activity points to v6.7 commit fa8c4f9a665b ("mm: fix draining remote
pageset") ; is this what you are referring to?

Thanks, and have a great day,
Zach



>
> I am not objecting to the patch discussed here but it would be really
> good to understand the underlying problem and the scale of it.
>
> Thanks!
> --
> Michal Hocko
> SUSE Labs