Re: [PATCHv5] mm: skip CMA pages when they are not available

From: Zhaoyang Huang
Date: Mon Jun 12 2023 - 05:52:39 EST


On Mon, Jun 12, 2023 at 5:29 PM David Hildenbrand <david@xxxxxxxxxx> wrote:
>
> On 10.06.23 00:35, Andrew Morton wrote:
> > On Wed, 31 May 2023 10:51:01 +0800 "zhaoyang.huang" <zhaoyang.huang@xxxxxxxxxx> wrote:
> >
> >> From: Zhaoyang Huang <zhaoyang.huang@xxxxxxxxxx>
> >>
> >> This patch fixes unproductive reclaiming of CMA pages by skipping them when they
> >> are not available for current context. It is arise from bellowing OOM issue, which
> >> caused by large proportion of MIGRATE_CMA pages among free pages.
> >>
> >> [ 36.172486] [03-19 10:05:52.172] ActivityManager: page allocation failure: order:0, mode:0xc00(GFP_NOIO), nodemask=(null),cpuset=foreground,mems_allowed=0
> >> [ 36.189447] [03-19 10:05:52.189] DMA32: 0*4kB 447*8kB (C) 217*16kB (C) 124*32kB (C) 136*64kB (C) 70*128kB (C) 22*256kB (C) 3*512kB (C) 0*1024kB 0*2048kB 0*4096kB = 35848kB
> >> [ 36.193125] [03-19 10:05:52.193] Normal: 231*4kB (UMEH) 49*8kB (MEH) 14*16kB (H) 13*32kB (H) 8*64kB (H) 2*128kB (H) 0*256kB 1*512kB (H) 0*1024kB 0*2048kB 0*4096kB = 3236kB
> >> ...
> >> [ 36.234447] [03-19 10:05:52.234] SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC)
> >> [ 36.234455] [03-19 10:05:52.234] cache: ext4_io_end, object size: 64, buffer size: 64, default order: 0, min order: 0
> >> [ 36.234459] [03-19 10:05:52.234] node 0: slabs: 53,objs: 3392, free: 0
> >>
> >
> > We saw plenty of feedback for earlier versions, but now silence. Does
> > this mean we're all OK with v5?
>
> The logic kind-of makes sense to me (but the kswapd special-casing
> already shows that it might be a bit fragile for future use), but I did
> not yet figure out if this actually fixes something or is a pure
> performance improvement.
>
> As we phrased it in the comment "It is waste of effort", but in the
> patch description "This patch fixes unproductive reclaiming" + a scary
> dmesg.
>
> Am I correct that this is a pure performance optimization (and the issue
> revealed itself in that OOM report), or does this actually *fix* something?
>
> If it's a performance improvement, it would be good to show that it is
> an actual improvement worth the churn ...
Sorry for the confusion. As for the OOM issue, the previous
commit(https://lkml.kernel.org/r/1683782550-25799-1-git-send-email-zhaoyang.huang@xxxxxxxxxx)
helps to decrease the fail rate from 12/20 to 2/20, which it turn to
be 0 when applying this patch.
>
> --
> Cheers,
>
> David / dhildenb
>