Re: [patch] mm, oom: stop reclaiming if GFP_ATOMIC will start failing soon

From: Michal Hocko
Date: Tue Apr 28 2020 - 03:43:12 EST


On Mon 27-04-20 16:35:58, Andrew Morton wrote:
[...]
> No consumer of GFP_ATOMIC memory should consume an unbounded amount of
> it.
> Subsystems such as networking will consume a certain amount and
> will then start recycling it. The total amount in-flight will vary
> over the longer term as workloads change. A dynamically tuning
> threshold system will need to adapt rapidly enough to sudden load
> shifts, which might require unreasonable amounts of headroom.

I do agree. __GFP_HIGH/__GFP_ATOMIC are bound by the size of the
reserves under memory pressure. Then allocatios start failing very
quickly and users have to cope with that, usually by deferring to a
sleepable context. Tuning reserves dynamically for heavy reserves
consumers would be possible but I am worried that this is far from
trivial.

We definitely need to understand what is going on here. Why doesn't
kswapd + N*direct reclaimers do not provide enough memory to satisfy
both N threads + reserves consumers? How many times those direct
reclaimers have to retry?

We used to have the allocation stall warning as David mentioned in the
patch description and I have seen it triggering without heavy reserves
consumers (aka reported free pages corresponded to the min watermark).
The underlying problem was usually kswapd being stuck on some FS locks,
direct reclaimers stuck in shrinkers or way too overloaded system with
dozens if not hundreds of processes stuck in the page allocator each
racing with the reclaim and betting on luck. The last problem was the
most annoying because it is really hard to tune for.
--
Michal Hocko
SUSE Labs