Re: [PATCH] mm/damon: Make the sampling more accurate

From: Baolin Wang
Date: Fri Mar 18 2022 - 10:11:13 EST




On 3/18/2022 8:15 PM, sj@xxxxxxxxxx wrote:
On Fri, 18 Mar 2022 19:58:07 +0800 Baolin Wang <baolin.wang@xxxxxxxxxxxxxxxxx> wrote:



On 3/18/2022 6:49 PM, sj@xxxxxxxxxx wrote:
On Fri, 18 Mar 2022 18:01:19 +0800 Baolin Wang <baolin.wang@xxxxxxxxxxxxxxxxx> wrote:


On 3/18/2022 5:40 PM, sj@xxxxxxxxxx wrote:
Hi Baolin,

On Fri, 18 Mar 2022 17:23:13 +0800 Baolin Wang <baolin.wang@xxxxxxxxxxxxxxxxx> wrote:

When I try to sample the physical address with DAMON to migrate pages
on tiered memory system, I found it will demote some cold regions mistakenly.
Now we will choose an physical address in the region randomly, but if
its corresponding page is not an online LRU page, we will ignore the
accessing status in this cycle of sampling, and actually will be treated
as a non-accessed region. Suppose a region including some non-LRU pages,
it will be treated as a cold region with a high probability, and may be
merged with adjacent cold regions, but there are some pages may be
accessed we missed.

So instead of ignoring the access status of this region if we did not find
a valid page according to current sampling address, we can use last valid
sampling address to help to make the sampling more accurate, then we can do
a better decision.

Well... Offlined pages are also a valid part of the memory region, so treating
those as not accessed and making the memory region containing the offlined
pages looks colder seems legal to me. IOW, this approach could make memory
regions containing many non-online-LRU pages as hot.

IMO I don't think this is a problem, since if this region containing
many non-online-LRU pages is treated as hot, which means threre are aome
pages are hot, right? We can find them and promote them to fast memory
(or do other schemes). Meanwhile, for non-online-LRU pages, we can
filter them and do nothing for them, since we can not get a valid page
struct for them.

For some of DAMOS actions that you mentioned, that could make sense. However,
that wouldn't make much sense for some other cases, especially for manual
DAMON-based access pattern profiling.

I am not sure about this case, could you elaborate on how this can worse
the case you mentioned?

For an example, let's suppose a user using DAMON to know the working set size
of the system. And further suppose there is a region that containing many
offlined pages and one online hot page. With this patch, once DAMON sampled
the one hot page, the entire region will be reported as hot, though the other
offlined pages has not accessed. As a result, the user will think the working
set size is bigger than real.

OK, sounds reasonable. Seems I need add a flag to indicate if we should ignore offline or non-lru pages when monitoring for some schemes, which can help to do a good decision.