Re: [RFC PATCH] mm/damon/core: use number of passed access sampling as a timer

From: SeongJae Park
Date: Tue Aug 29 2023 - 21:52:18 EST


On Sun, 27 Aug 2023 00:37:27 +0000 SeongJae Park <sj@xxxxxxxxxx> wrote:

> DAMON sleeps for sampling interval after each sampling, and check if
> it's time for further doing aggregation and ops updating using
> ktime_get_coarse_ts64() and baseline timestamps for the two periodic
> operations. That's for making the operations occur at deterministic
> timing. However, it turned out it could still result in indeterministic
> and even not-that-intuitive results.
>
> After all, timer functions, and especially sleep functions that DAMON
> uses to wait for specific timing, could contain some errors. Those
> errors are legal, so no problem. However, depending on such legal
> timing errors, the nr_accesses can be larger than aggregation interval
> divided by sampling interval. For example, with the default setting (5
> ms sampling interval and 100 ms aggregation interval) we frequently show
> regions having nr_accesses larger than 20. Also, if the execution of a
> DAMOS scheme takes a long time, next aggregation could happen before
> enough number of samples are collected.
>
> Since access check sampling is the smallest unit work of DAMON, using
> the number of passed sampling intervals as the DAMON-internal timer can
> easily avoid these problems. That is, convert aggregation and ops
> update intervals to numbers of sampling intervals that need to be passed
> before those operations be executed, count the number of passed sampling
> intervals, and invoke the operations as soon as the specific amount of
> sampling intervals passed. Make the change.
>
> Signed-off-by: SeongJae Park <sj@xxxxxxxxxx>
> ---
> include/linux/damon.h | 14 ++++++--
> mm/damon/core.c | 84 +++++++++++++++++++------------------------
> 2 files changed, 48 insertions(+), 50 deletions(-)
>
> diff --git a/include/linux/damon.h b/include/linux/damon.h
> index ab3089de1478..9a32b8fd0bd3 100644
> --- a/include/linux/damon.h
> +++ b/include/linux/damon.h
> @@ -524,8 +524,18 @@ struct damon_ctx {
> struct damon_attrs attrs;
>
> /* private: internal use only */
> - struct timespec64 last_aggregation;
> - struct timespec64 last_ops_update;
> + /* number of sample intervals that passed since this context started */
> + unsigned long passed_sample_intervals;
> + /*
> + * number of sample intervals that should be passed before next
> + * aggregation
> + */
> + unsigned long next_aggregation_sis;
> + /*
> + * number of sample intervals that should be passed before next ops
> + * update
> + */
> + unsigned long next_ops_update_sis;
>
> /* public: */
> struct task_struct *kdamond;
> diff --git a/mm/damon/core.c b/mm/damon/core.c
> index 988dc39e44b1..83af336bb0e6 100644
> --- a/mm/damon/core.c
> +++ b/mm/damon/core.c
> @@ -456,8 +456,11 @@ struct damon_ctx *damon_new_ctx(void)
> ctx->attrs.aggr_interval = 100 * 1000;
> ctx->attrs.ops_update_interval = 60 * 1000 * 1000;
>
> - ktime_get_coarse_ts64(&ctx->last_aggregation);
> - ctx->last_ops_update = ctx->last_aggregation;
> + ctx->passed_sample_intervals = 0;
> + ctx->next_aggregation_sis = ctx->attrs.aggr_interval /
> + ctx->attrs.sample_interval;
> + ctx->next_ops_update_sis = ctx->attrs.ops_update_interval /
> + ctx->attrs.sample_interval;
>
> mutex_init(&ctx->kdamond_lock);
>
> @@ -577,6 +580,9 @@ static void damon_update_monitoring_results(struct damon_ctx *ctx,
> */
> int damon_set_attrs(struct damon_ctx *ctx, struct damon_attrs *attrs)
> {
> + unsigned long sample_interval;
> + unsigned long remaining_interval_us;
> +
> if (attrs->min_nr_regions < 3)
> return -EINVAL;
> if (attrs->min_nr_regions > attrs->max_nr_regions)
> @@ -584,6 +590,20 @@ int damon_set_attrs(struct damon_ctx *ctx, struct damon_attrs *attrs)
> if (attrs->sample_interval > attrs->aggr_interval)
> return -EINVAL;
>
> + sample_interval = attrs->sample_interval ? attrs->sample_interval : 1;
> +
> + /* adjust next_aggregation_sis */
> + remaining_interval_us = ctx->attrs.sample_interval *
> + (ctx->next_aggregation_sis - ctx->passed_sample_intervals);
> + ctx->next_aggregation_sis = ctx->passed_sample_intervals +
> + remaining_interval_us / sample_interval;
> +
> + /* adjust next_ops_update_sis */
> + remaining_interval_us = ctx->attrs.sample_interval *
> + (ctx->next_ops_update_sis - ctx->passed_sample_intervals);
> + ctx->next_ops_update_sis = ctx->passed_sample_intervals +
> + remaining_interval_us / sample_interval;
> +
> damon_update_monitoring_results(ctx, attrs);
> ctx->attrs = *attrs;
> return 0;
> @@ -757,38 +777,6 @@ int damon_stop(struct damon_ctx **ctxs, int nr_ctxs)
> return err;
> }

So, the new timer variables are initialized in damon_new_ctx() as the default
(5ms sampling interval, 100ms aggregation interval, and 1s ops update interval)
and adjusted in damon_set_attrs(). It means the first intervals will be the
default ones always. I will make those initialized at the beginning of
kdamond_fn() from the next spin.

[...]
> @@ -1436,6 +1412,8 @@ static int kdamond_fn(void *data)
> sz_limit = damon_region_sz_limit(ctx);

As mentioned abovely, the timer variables will be initialized around here (at
the beggining of kdamond_fn(), before going into the loop), in the next spin.


Thanks,
SJ

>
> while (!kdamond_need_stop(ctx)) {
> + unsigned long sample_interval;
> +
> if (kdamond_wait_activation(ctx))
> break;
>
> @@ -1446,11 +1424,17 @@ static int kdamond_fn(void *data)
> break;
>
> kdamond_usleep(ctx->attrs.sample_interval);
> + ctx->passed_sample_intervals++;
>
> if (ctx->ops.check_accesses)
> max_nr_accesses = ctx->ops.check_accesses(ctx);
>
> - if (kdamond_aggregate_interval_passed(ctx)) {
> + sample_interval = ctx->attrs.sample_interval ?
> + ctx->attrs.sample_interval : 1;
> + if (ctx->passed_sample_intervals ==
> + ctx->next_aggregation_sis) {
> + ctx->next_aggregation_sis +=
> + ctx->attrs.aggr_interval / sample_interval;
> kdamond_merge_regions(ctx,
> max_nr_accesses / 10,
> sz_limit);
> @@ -1465,7 +1449,11 @@ static int kdamond_fn(void *data)
> ctx->ops.reset_aggregated(ctx);
> }
>
> - if (kdamond_need_update_operations(ctx)) {
> + if (ctx->passed_sample_intervals ==
> + ctx->next_ops_update_sis) {
> + ctx->next_ops_update_sis +=
> + ctx->attrs.ops_update_interval /
> + sample_interval;
> if (ctx->ops.update)
> ctx->ops.update(ctx);
> sz_limit = damon_region_sz_limit(ctx);
> --
> 2.25.1
>
>