Re: [PATCH 1/2] mm/damon/sysfs: Implement recording feature

From: cuiyangpei
Date: Mon Jan 22 2024 - 00:46:45 EST


On Sun, Dec 03, 2023 at 07:37:45PM +0000, SeongJae Park wrote:
> On 2023-12-03T13:43:13+08:00 cuiyangpei <cuiyangpei@xxxxxxxxx> wrote:
>
> > On Fri, Dec 01, 2023 at 05:31:12PM +0000, SeongJae Park wrote:
> > > Hi Cuiyangpei,
> > >
> > > On Fri, 1 Dec 2023 20:25:07 +0800 cuiyangpei <cuiyangpei@xxxxxxxxx> wrote:
> > >
> > > > On Thu, Nov 30, 2023 at 07:44:20PM +0000, SeongJae Park wrote:
> > > > > Hi Cuiyangpei,
> > > > >
> > > > > On Thu, 30 Nov 2023 17:14:26 +0800 cuiyangpei <cuiyangpei@xxxxxxxxx> wrote:
> > > > >
> > > > > > Hi SeongJae,
> > > > > >
> > > > > > We also investigated the operation schemes you mentioned, but we don't
> > > > > > think it can fit our needs.
> > > > > >
> > > > > > On android, user will open many apps and switch between these apps as
> > > > > > needs. We hope to monitor apps' memory access only when they are on
> > > > > > foreground and record the memory access pattern when they are switched
> > > > > > to the background.
> > > > > >
> > > > > > When avaliable memory reaches a threshold, we will use these access
> > > > > > patterns with some strategies to recognize those memory that will have
> > > > > > little impact on user experience and to reclaim them proactively.
> > > > > >
> > > > > > I'm not sure I have clarified it clearly, if you still have questions
> > > > > > on this, please let us know.
> > > > >
> > > > > So, to my understanding, you expect applications may keep similar access
> > > > > pattern when they are in foreground, but have a different, less aggressive
> > > > > access pattern in background, and therefore reclaim memory based on the
> > > > > foreground-access pattern, right?
> > > > >
> > > >
> > > > Different apps may have different access pattern. On android, the apps will
> > > > join in freeze cgroup and be frozen after switch to the background. So we
> > > > monitor apps' memory access only when they are in foreground.
> > >
> > > Thank you for this enlightening me :)
> > >
> > > > > Very interesting idea, thank you for sharing!
> > > > >
> > > > > Then, yes, I agree current DAMOS might not that helpful for the situation, and
> > > > > this record feature could be useful for your case.
> > > > >
> > > > > That said, do you really need full recording of the monitoring results? If
> > > > > not, DAMOS provides DAMOS tried regions feature[1], which allows users get the
> > > > > monitoring results snapshot that include both frequency and recency of all
> > > > > regions in an efficient way. If single snapshot is not having enough
> > > > > information for you, you could collect multiple snapshots.
> > > > >
> > > > > You mentioned absence of Python on Android as a blocker of DAMOS use on the
> > > > > previous reply[2], but DAMOS tried regions feature is not depend on tracepoints
> > > > > or Python.
> > > > >
> > > > > Of course, I think you might already surveyed it but found some problems.
> > > > > Could you please share that in detail if so?
> > > > >
> > > > DAMOS tried regions feature you mentioned is not fully applicable. It needs to
> > > > apply schemes on regions. There is no available scheme we can use for our use
> > > > case. What we need is to return regions with access frequency and recency to
> > > > userspace for later use.
> > >
> > >
> > > Thank you for the answer, I understand your concern. One of the available
> > > DAMOS action is 'stat'[1], which does nothing but just count the statistic.
> > > Using DAMOS scheme for any access pattern with 'stat' action, you can extract
> > > the access pattern via DAMOS tried regions feature of DAMON sysfs interface,
> > > without making any unnecessary impact to the workload.
> > >
> > > Quote from [2]:
> > >
> > > The expected usage of this directory is investigations of schemes' behaviors,
> > > and query-like efficient data access monitoring results retrievals. For the
> > > latter use case, in particular, users can set the action as stat and set the
> > > access pattern as their interested pattern that they want to query.
> > >
> > > For example, you could
> > >
> > > # cd /sys/kernel/mm/damon/admin
> > > #
> > > # # populate directories
> > > # echo 1 > kdamonds/nr_kdamonds; echo 1 > kdamonds/0/contexts/nr_contexts;
> > > # echo 1 > kdamonds/0/contexts/0/schemes/nr_schemes
> > > # cd kdamonds/0/contexts/0/schemes/0
> > > #
> > > # # set the access pattern for any case (max as 2**64 - 1), and action as stat
> > > # echo 0 > access_pattern/sz/min
> > > # echo 18446744073709551615 > access_pattern/sz/max
> > > # echo 0 > access_pattern/nr_accesses/min
> > > # echo 18446744073709551615 > access_pattern/nr_accesses/max
> > > # echo 0 > access_pattern/age/min
> > > # echo 18446744073709551615 > access_pattern/age/max
> > > # echo stat > action
> > >
> > > And this is how DAMON user-space tool is getting the snapshot with 'damo show'
> > > command[3].
> > >
> > > Could this be used for your case? Please ask any question if you have :)
> > >
> > > [1] https://docs.kernel.org/admin-guide/mm/damon/usage.html#schemes-n
> > > [2] https://docs.kernel.org/admin-guide/mm/damon/usage.html#schemes-n-tried-regions,
> > > [3] https://github.com/awslabs/damo/blob/next/USAGE.md#damo-show
> >
> > Thank you for your detailed response, it is very helpful to us. We will look into it
> > and contact you if we have any questions.
>
> So glad to hear this. Please let me know if you have any questions or need any
> help :)
>
>
> Thanks,
> SJ
>
> >
> > >
> > >
> > > Thanks,
> > > SJ
> > >
> > > > > [1] https://docs.kernel.org/admin-guide/mm/damon/usage.html#schemes-n-tried-regions
> > > > > [2] https://lore.kernel.org/damon/20231129131315.GB12957@cuiyangpei/
> > > > >
> > > > >
> > > > > Thanks,
> > > > > SJ
> > > > >
> > > > > >
> > > > > > Thanks.

Hi SeongJae,

We set 'access_pattern' and 'stat' action in schemes when apps are on
foreground, record apps' memory access pattern when they are switched
to the background with 'update_schemes_tried_regions' state. But it
catch the snapshot after next aggregation interval. DAMON is still
sampling during the app switches to the background and the next
aggregation time, which can cause the value of "age" to change. The
sampling results during this period cannot accurately reflect the app's
foreground access pattern.

Is there any way to catch sampling result immediately after setting the
"update_schemes_tried_regions" state? Alternatively, can it return the
"last_nr_accesses" and "last_age" values in tried_regions/<N> directory?

Do you have any other suggestions?

Thanks.