Re: mm: 5.16 regression: reclaim_throttle leads to stall in near-OOM conditions

From: Sultan Alsawaf
Date: Mon Dec 20 2021 - 03:50:40 EST


On Fri, Nov 26, 2021 at 04:24:16PM +0000, Mel Gorman wrote:
> It's somewhat expected. If the system is able to make some sort of
> progress and kswapd is active, it'll throttle until progress is
> impossible. It'll be somewhat variable how long it can keep making
> progress be it discarding page cache or writing to swap but it'll only
> OOM when the system is truly OOM.
>
> Might be worth trying the patch below on top. It will delay throttling
> for longer with the caveat that CPU usage due to reclaim when very low
> on memory may be excessive.

Mel,

Perhaps my old submission [1] could be helpful here? I could send a refreshed
version if you're interested. Using wall time to throttle reclaim seems quite
catastrophic IMO, given the inherent assumptions it makes about the running
system's performance characteristics and its workloads.

My patch tackles the issue from the opposite direction: rather than throttling
when there's no reclaim progress to be made, my approach stops kswapd early when
there is no longer any need for reclaim, which conveniently doesn't require any
sort of tunable or heuristic since kswapd can just be immediately woken up again
right after if needed.

Looking back, it seems your chief complaint was that my patch may stop kswapd
before it could reclaim up to the high watermark, which could thereby introduce
stalls; however, I've never run into any such issue in my testing, and neither
have the several people who use my patch under a wide range of setups.

[1] https://lore.kernel.org/linux-mm/20200219182522.1960-1-sultan@xxxxxxxxxxxxxxx/

Sultan