Re: [patch 0/2] mm: too_many_isolated can stall due to out of sync VM counters

From: Michal Hocko
Date: Wed Nov 22 2023 - 08:56:30 EST


On Wed 22-11-23 08:26:02, Marcelo Tosatti wrote:
[...]
> Michal,
>
> Let me know if you have any objections to the patch, thanks.

I do not think you have exaplained how the patch helps nor you have
shown it has fixed the described problem. You seem to be very focused on
the specific snapshot which I do agree shows that the data is out of
sync and that there is throttling happening when strictly speaking it
should noti. But (let me repeat) those discrepancies are so small that
it is very likely that concurrent reclaimers will be stalled (just take
one to isolate those pages) anyway. Maybe this leads to an earlier OOM
killer invocation as untrottled reclaimers will be able to conclude
there is no progress rather than being throttled on the direct reclaim.

That being said I am not saying the patch is incorrect. Nevertheless, I
do not think we want to merge this patch without a better understanding
what is going on in your specific case and what kind of runtime
difference does the patch make in that case. From your previous email it
seems like the actual case is mostly memory stress test that manages to
fill out the memory to push almost all the file LRU while anon LRU is
not reclaimable for some reason. That shouldn't be terribly hard to
reproduce.

--
Michal Hocko
SUSE Labs