Re: [RFC 0/4] Introduce unbalance proactive reclaim

From: Michal Hocko
Date: Thu Nov 09 2023 - 08:46:56 EST


On Thu 09-11-23 21:07:29, Huan Yang wrote:
[...]
> > > Of course, as you suggested, madvise can also achieve this, but
> > > implementing it in the agent may be more complex.(In terms of
> > > achieving the same goal, using memcg to group all the processes of an
> > > APP and perform proactive reclamation is simpler than using madvise
> > > and scanning multiple processes of an application using an agent?)
> > It might be more involved but the primary question is whether it is
> > usable for the specific use case. Madvise interface is not LRU aware but
> > you are not really talking about that to be a requirement? So it would
> > really help if you go deeper into details on how is the interface
> > actually supposed to be used in your case.
> In mobile field, we usually configure zram to compress anonymous page.
> We can approximate to expand memory usage with limited hardware memory
> by using zram.
>
> With proper strategies, an 8GB RAM phone can approximate the usage of a 12GB
> phone
> (or more).
>
> In our strategy, we group memcg by application. When the agent detects that
> an
> application has entered the background, then frozen, and has not been used
> for a long time,
> the agent will slowly issue commands to reclaim the anonymous page of that
> application.
>
> With this interface, `echo memory anon > memory.reclaim`

This doesn't really answer my questions above.

> > Also make sure to exaplain why you cannot use other existing interfaces.
> > For example, why you simply don't decrease the limit of the frozen
> > cgroup and rely on the normal reclaim process to evict the most cold
> This is a question of reclamation tendency, and simply decreasing the limit
> of the frozen cgroup cannot achieve this.

Why?

> > memory? What are you basing your anon vs. file proportion decision on?
> When zram is configured and anonymous pages are reclaimed proactively, the
> refault
> probability of anonymous pages is low when an application is frozen and not
> reopened.
> Also, the cost of refaulting from zram is relatively low.
>
> However, file pages usually have shared properties, so even if an
> application is frozen,
> other processes may still access the file pages. If a limit is set and the
> reclamation encounters
> file pages, it will cause a certain amount of refault I/O, which is costly
> for mobile devices.

Two points here (and the reason why I am repeatedly asking for some
data) 1) are you really seeing shared and actively used page cache pages
being reclaimed? 2) Is the refault IO really a problem. What kind of
storage those phone have that this is more significant than potentially
GB of compressed anonymous memory which would need CPU to refaulted
back. I mean do you have any actual numbers to show that the default
reclaim strategy would lead to a less utilized or less performant
system?
--
Michal Hocko
SUSE Labs