Re: [RFC 0/4] Introduce unbalance proactive reclaim

From: Huan Yang
Date: Fri Nov 10 2023 - 15:47:37 EST



在 2023/11/10 12:00, Huang, Ying 写道:
[Some people who received this message don't often get email from ying.huang@xxxxxxxxx. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]

Huan Yang <link@xxxxxxxx> writes:

在 2023/11/10 9:19, Huang, Ying 写道:
[Some people who received this message don't often get email from ying.huang@xxxxxxxxx. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]

Huan Yang <link@xxxxxxxx> writes:

在 2023/11/9 18:39, Michal Hocko 写道:
[Some people who received this message don't often get email from mhocko@xxxxxxxx. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]

On Thu 09-11-23 18:29:03, Huan Yang wrote:
HI Michal Hocko,

Thanks for your suggestion.

在 2023/11/9 17:57, Michal Hocko 写道:
[Some people who received this message don't often get email from mhocko@xxxxxxxx. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]

On Thu 09-11-23 11:38:56, Huan Yang wrote:
[...]
If so, is it better only to reclaim private anonymous pages explicitly?
Yes, in practice, we only proactively compress anonymous pages and do not
want to touch file pages.
If that is the case and this is mostly application centric (which you
seem to be suggesting) then why don't you use madvise(MADV_PAGEOUT)
instead.
Madvise may not be applicable in this scenario.(IMO)

This feature is aimed at a core goal, which is to compress the anonymous
pages
of frozen applications.

How to detect that an application is frozen and determine which pages can be
safely reclaimed is the responsibility of the policy part.

Setting madvise for an application is an active behavior, while the above
policy
is a passive approach.(If I misunderstood, please let me know if there is a
better
way to set madvise.)
You are proposing an extension to the pro-active reclaim interface so
this is an active behavior pretty much by definition. So I am really not
following you here. Your agent can simply scan the address space of the
application it is going to "freeze" and call pidfd_madvise(MADV_PAGEOUT)
on the private memory is that is really what you want/need.
There is a key point here. We want to use the grouping policy of memcg
to perform
proactive reclamation with certain tendencies. Your suggestion is to
reclaim memory
by scanning the task process space. However, in the mobile field,
memory is usually
viewed at the granularity of an APP.

Therefore, after an APP is frozen, we hope to reclaim memory uniformly
according
to the pre-grouped APP processes.

Of course, as you suggested, madvise can also achieve this, but
implementing it in
the agent may be more complex.(In terms of achieving the same goal,
using memcg
to group all the processes of an APP and perform proactive reclamation
is simpler
than using madvise and scanning multiple processes of an application
using an agent?)
I still think that it's not too complex to use process_madvise() to do
this. For each process of the application, the agent can read
/proc/PID/maps to get all anonymous address ranges, then call
process_madvise(MADV_PAGEOUT) to reclaim pages. This can even filter
out shared anonymous pages. Does this work for you?
Thanks for this suggestion. This way can avoid touch shared anonymous, it's
pretty well. But, I have some doubts about this, CPU resources are
usually limited in
embedded devices, and power consumption must also be taken into
consideration.

If this approach is adopted, the agent needs to periodically scan
frozen applications
and set pageout for the address space. Is the frequency of this active
operation more
complex and unsuitable for embedded devices compared to reclamation based on
memcg grouping features?
In memcg based solution, when will you start the proactive reclaiming?
You can just replace the reclaiming part of the solution from memcg
proactive reclaiming to process_madvise(MADV_PAGEOUT). Because you can
get PIDs in a memcg. Is it possible?

In addition, without LRU, it is difficult to control the reclamation
of only partially cold
anonymous page data of frozen applications. For example, if I only
want to proactively
reclaim 100MB of anonymous pages and issue the proactive reclamation
interface,
we can use the LRU feature to only reclaim 100MB of cold anonymous pages.
However, this cannot be achieved through madvise.(If I have
misunderstood something,
please correct me.)
IIUC, it should be OK to reclaim all private anonymous pages of an
application in your specific use case? If you really want to restrict
This is a gradual process, It will not reclaim all anonymous pages at once.
the number of pages reclaimed, it's possible too. You can restrict the
size of address range to call process_madvise(MADV_PAGEOUT), and check
the RSS of the application. The accuracy of the number reclaimed isn't
good. But I think that it should OK in practice?
If you only want to reclaim all anonymous memory, this can indeed be done,
and fast. :)

BTW: how do you know the number of pages to be reclaimed proactively in
memcg proactive reclaiming based solution?
One point here is that we are not sure how long the frozen application will be
opened, it could be 10 minutes, an hour, or even days.
So we need to predict and try, gradually reclaim anonymous pages in proportion,
preferably based on the LRU algorithm.
For example, if the application has been frozen for 10 minutes, reclaim 5% of
anonymous pages; 30min:25%anon, 1hour:75%, 1day:100%.
It is even more complicated as it requires adding a mechanism for predicting
failure penalties.

--
Best Regards,
Huang, Ying

--
Thanks,
Huan Yang