Re: [External] : Re: [PATCH] mm, oom: Add lru_add_drain() in __oom_reap_task_mm()

From: Michal Hocko
Date: Fri Jan 12 2024 - 03:49:23 EST


On Thu 11-01-24 16:08:57, Jianfeng Wang wrote:
>
>
> On 1/11/24 1:54 PM, Andrew Morton wrote:
> > On Thu, 11 Jan 2024 10:54:45 -0800 Jianfeng Wang <jianfeng.w.wang@xxxxxxxxxx> wrote:
> >
> >>
> >>> Unless you can show any actual runtime effect of this patch then I think
> >>> it shouldn't be merged.
> >>>
> >>
> >> Thanks for raising your concern.
> >> I'd call it a trade-off rather than "not really correct". Look at
> >> unmap_region() / free_pages_and_swap_cache() written by Linus. These are in
> >> favor of this pattern, which indicates that the trade-off (i.e. draining
> >> local CPU or draining all CPUs or no draining at all) had been made in the
> >> same way in the past. I don't have a specific runtime effect to provide,
> >> except that it will free 10s kB pages immediately during OOM.

You are missing an important point. Those two calls are quite different.
oom_reaper unmaps memory after all the reclaim attempts have failed.
That includes draining all sorts of caches on the way. Including
draining LRU pcp cache (look for lru_add_drain_all in the reclaim path).

> > I don't think it's necessary to run lru_add_drain() for each vma. Once
> > we've done it it once, it can be skipped for additional vmas.
> >
> Agreed.
>
> > That's pretty minor because the second and successive calls will be
> > cheap. But it becomes much more significant if we switch to
> > lru_add_drain_all(), which sounds like what we should be doing here.
> > Is it possible?
> >
> What do you both think of adding lru_add_drain_all() prior to the for loop?

lru_add_drain_all relies on WQs. And we absolutely do not want to get
oom_reaper stuck just because all the WQ is jammed. So no, this is
actually actively harmful!

All that being said I stand by my previous statement that this patch is
not doing anything measurably useful. Prove me wrong otherwise I am
against merging "just for consistency patch". Really, we should go and
re-evaluate existing local lru draining callers. I wouldn't be surprised
if we removed some of them.

--
Michal Hocko
SUSE Labs