Re: [RFC 0/4] Introduce unbalance proactive reclaim

From: Michal Hocko
Date: Tue Nov 14 2023 - 05:04:43 EST


On Mon 13-11-23 09:54:55, Huan Yang wrote:
>
> 在 2023/11/10 20:32, Michal Hocko 写道:
> > On Fri 10-11-23 14:21:17, Huan Yang wrote:
> > [...]
> > > > BTW: how do you know the number of pages to be reclaimed proactively in
> > > > memcg proactive reclaiming based solution?
> > > One point here is that we are not sure how long the frozen application
> > > will be opened, it could be 10 minutes, an hour, or even days. So we
> > > need to predict and try, gradually reclaim anonymous pages in
> > > proportion, preferably based on the LRU algorithm. For example, if
> > > the application has been frozen for 10 minutes, reclaim 5% of
> > > anonymous pages; 30min:25%anon, 1hour:75%, 1day:100%. It is even more
> > > complicated as it requires adding a mechanism for predicting failure
> > > penalties.
> > Why would make your reclaiming decisions based on time rather than the
> > actual memory demand? I can see how a pro-active reclaim could make a
> > head room for an unexpected memory pressure but applying more pressure
> > just because of inactivity sound rather dubious to me TBH. Why cannot
> > you simply wait for the external memory pressure (e.g. from kswapd) to
> > deal with that based on the demand?
> Because the current kswapd and direct memory reclamation are a passive
> memory reclamation based on the watermark, and in the event of triggering
> these reclamation scenarios, the smoothness of the phone application cannot
> be guaranteed.

OK, so you are worried about latencies on spike memory usage.

> (We often observe that when the above reclamation is triggered, there
> is a delay in the application startup, usually accompanied by block
> I/O, and some concurrency issues caused by lock design.)

Does that mean you do not have enough head room for kswapd to keep with
the memory demand? It is really hard to discuss this without some actual
numbers or more specifics.

> To ensure the smoothness of application startup, we have a module in
> Android called LMKD (formerly known as lowmemorykiller). Based on a
> certain algorithm, LMKD detects if application startup may be delayed
> and proactively kills inactive applications. (For example, based on
> factors such as refault IO and swap usage.)
>
> However, this behavior may cause the applications we want to protect
> to be killed, which will result in users having to wait for them to
> restart when they are reopened, which may affect the user
> experience.(For example, if the user wants to reopen the application
> interface they are working on, or re-enter the order interface they
> were viewing.)

This suggests that your LMKD doesn't pick up the right victim to kill.
And I suspect this is a fundamental problem of those pro-active oom
killer solutions.

> Therefore, the above proactive reclamation interface is designed to
> compress memory types with minimal cost for upper-layer applications
> based on reasonable strategies, in order to avoid triggering LMKD or
> memory reclamation as much as possible, even if it is not balanced.

This would suggest that MADV_PAGEOUT is really what you are looking for.
If you really aim at compressing a specific type of memory then tweking
reclaim to achieve that sounds like a shortcut because madvise based
solution is more involved. But that is not a solid justification for
adding a new interface.
--
Michal Hocko
SUSE Labs