Re: [PATCH v22 10/18] mm/damon: Implement a debugfs-based user space interface

From: Shakeel Butt
Date: Wed Nov 25 2020 - 10:31:11 EST


On Tue, Oct 20, 2020 at 2:06 AM SeongJae Park <sjpark@xxxxxxxxxx> wrote:
>
> From: SeongJae Park <sjpark@xxxxxxxxx>
>
> DAMON is designed to be used by kernel space code such as the memory
> management subsystems, and therefore it provides only kernel space API.
> That said, letting the user space control DAMON could provide some
> benefits to them. For example, it will allow user space to analyze
> their specific workloads and make their own special optimizations.
>
> For such cases, this commit implements a simple DAMON application kernel
> module, namely 'damon-dbgfs', which merely wraps the DAMON api and
> exports those to the user space via the debugfs.
>
> 'damon-dbgfs' exports three files, ``attrs``, ``target_ids``, and
> ``monitor_on`` under its debugfs directory, ``<debugfs>/damon/``.
>
> Attributes
> ----------
>
> Users can read and write the ``sampling interval``, ``aggregation
> interval``, ``regions update interval``, and min/max number of
> monitoring target regions by reading from and writing to the ``attrs``
> file. For example, below commands set those values to 5 ms, 100 ms,
> 1,000 ms, 10, 1000 and check it again::
>
> # cd <debugfs>/damon
> # echo 5000 100000 1000000 10 1000 > attrs
> # cat attrs
> 5000 100000 1000000 10 1000
>
> Target IDs
> ----------
>
> Some types of address spaces supports multiple monitoring target. For
> example, the virtual memory address spaces monitoring can have multiple
> processes as the monitoring targets. Users can set the targets by
> writing relevant id values of the targets to, and get the ids of the
> current targets by reading from the ``target_ids`` file. In case of the
> virtual address spaces monitoring, the values should be pids of the
> monitoring target processes. For example, below commands set processes
> having pids 42 and 4242 as the monitoring targets and check it again::
>
> # cd <debugfs>/damon
> # echo 42 4242 > target_ids
> # cat target_ids
> 42 4242
>
> Note that setting the target ids doesn't start the monitoring.
>
> Turning On/Off
> --------------
>
> Setting the files as described above doesn't incur effect unless you
> explicitly start the monitoring. You can start, stop, and check the
> current status of the monitoring by writing to and reading from the
> ``monitor_on`` file. Writing ``on`` to the file starts the monitoring
> of the targets with the attributes. Writing ``off`` to the file stops
> those. DAMON also stops if every targets are invalidated (in case of
> the virtual memory monitoring, target processes are invalidated when
> terminated). Below example commands turn on, off, and check the status
> of DAMON::
>
> # cd <debugfs>/damon
> # echo on > monitor_on
> # echo off > monitor_on
> # cat monitor_on
> off
>
> Please note that you cannot write to the above-mentioned debugfs files
> while the monitoring is turned on. If you write to the files while
> DAMON is running, an error code such as ``-EBUSY`` will be returned.
>
> Signed-off-by: SeongJae Park <sjpark@xxxxxxxxx>
> Reviewed-by: Leonard Foerster <foersleo@xxxxxxxxx>
> ---
> include/linux/damon.h | 2 +
> mm/damon/Kconfig | 9 +
> mm/damon/Makefile | 1 +
> mm/damon/core.c | 48 +++++
> mm/damon/dbgfs.c | 428 ++++++++++++++++++++++++++++++++++++++++++
> 5 files changed, 488 insertions(+)
> create mode 100644 mm/damon/dbgfs.c
>
> diff --git a/include/linux/damon.h b/include/linux/damon.h
> index 70cc4b54212e..d675ea908a02 100644
> --- a/include/linux/damon.h
> +++ b/include/linux/damon.h
> @@ -226,6 +226,8 @@ void damon_free_target(struct damon_target *t);
> void damon_destroy_target(struct damon_target *t);
> unsigned int damon_nr_regions(struct damon_target *t);
>
> +struct damon_ctx *damon_new_ctx(void);
> +void damon_destroy_ctx(struct damon_ctx *ctx);
> int damon_set_targets(struct damon_ctx *ctx,
> unsigned long *ids, ssize_t nr_ids);
> int damon_set_attrs(struct damon_ctx *ctx, unsigned long sample_int,
> diff --git a/mm/damon/Kconfig b/mm/damon/Kconfig
> index 63b9c905b548..e38f95d28f74 100644
> --- a/mm/damon/Kconfig
> +++ b/mm/damon/Kconfig
> @@ -22,4 +22,13 @@ config DAMON_PRIMITIVES
> The primitives support only virtual address spaces. If this cannot
> cover your use case, you can implement and use your own primitives.
>
> +config DAMON_DBGFS
> + bool "DAMON debugfs interface"
> + depends on DAMON_PRIMITIVES && DEBUG_FS
> + help
> + This builds the debugfs interface for DAMON. The user space admins
> + can use the interface for arbitrary data access monitoring.
> +
> + If unsure, say N.
> +
> endmenu
> diff --git a/mm/damon/Makefile b/mm/damon/Makefile
> index 2f3235a52e5e..2295deb2fe0e 100644
> --- a/mm/damon/Makefile
> +++ b/mm/damon/Makefile
> @@ -2,3 +2,4 @@
>
> obj-$(CONFIG_DAMON) := core.o
> obj-$(CONFIG_DAMON_PRIMITIVES) += primitives.o
> +obj-$(CONFIG_DAMON_DBGFS) += dbgfs.o
> diff --git a/mm/damon/core.c b/mm/damon/core.c
> index d7957d8ff530..47baf859d7d9 100644
> --- a/mm/damon/core.c
> +++ b/mm/damon/core.c
> @@ -135,6 +135,40 @@ unsigned int damon_nr_regions(struct damon_target *t)
> return nr_regions;
> }
>
> +struct damon_ctx *damon_new_ctx(void)
> +{
> + struct damon_ctx *ctx;
> +
> + ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
> + if (!ctx)
> + return NULL;
> +
> + ctx->sample_interval = 5 * 1000;
> + ctx->aggr_interval = 100 * 1000;
> + ctx->regions_update_interval = 1000 * 1000;
> + ctx->min_nr_regions = 10;
> + ctx->max_nr_regions = 1000;
> +
> + ktime_get_coarse_ts64(&ctx->last_aggregation);
> + ctx->last_regions_update = ctx->last_aggregation;
> +
> + mutex_init(&ctx->kdamond_lock);
> +
> + INIT_LIST_HEAD(&ctx->targets_list);
> +
> + return ctx;
> +}
> +
> +void damon_destroy_ctx(struct damon_ctx *ctx)
> +{
> + struct damon_target *t, *next_t;
> +
> + damon_for_each_target_safe(t, next_t, ctx)
> + damon_destroy_target(t);
> +
> + kfree(ctx);
> +}
> +
> /**
> * damon_set_targets() - Set monitoring targets.
> * @ctx: monitoring context
> @@ -204,6 +238,20 @@ int damon_set_attrs(struct damon_ctx *ctx, unsigned long sample_int,
> return 0;
> }
>
> +/**
> + * damon_nr_running_ctxs() - Return number of currently running contexts.
> + */
> +int damon_nr_running_ctxs(void)
> +{
> + int nr_ctxs;
> +
> + mutex_lock(&damon_lock);
> + nr_ctxs = nr_running_ctxs;
> + mutex_unlock(&damon_lock);
> +

READ_ONCE() instead of mutex?

> + return nr_ctxs;
> +}
> +
> /* Returns the size upper limit for each monitoring region */
> static unsigned long damon_region_sz_limit(struct damon_ctx *ctx)
> {
> diff --git a/mm/damon/dbgfs.c b/mm/damon/dbgfs.c
> new file mode 100644
> index 000000000000..6316d4cae2a4
> --- /dev/null
> +++ b/mm/damon/dbgfs.c
> @@ -0,0 +1,428 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * DAMON Debugfs Interface
> + *
> + * Author: SeongJae Park <sjpark@xxxxxxxxx>
> + */
> +
> +#define pr_fmt(fmt) "damon-dbgfs: " fmt
> +
> +#include <linux/damon.h>
> +#include <linux/debugfs.h>
> +#include <linux/file.h>
> +#include <linux/mm.h>
> +#include <linux/module.h>
> +#include <linux/page_idle.h>
> +#include <linux/slab.h>
> +
> +static struct damon_ctx **dbgfs_ctxs;
> +static int dbgfs_nr_ctxs = 1;
> +static int dbgfs_nr_terminated_ctxs;
> +static struct dentry **dbgfs_dirs;
> +static DEFINE_MUTEX(damon_dbgfs_lock);
> +
> +/*
> + * Returns non-empty string on success, negarive error code otherwise.
> + */
> +static char *user_input_str(const char __user *buf, size_t count, loff_t *ppos)
> +{
> + char *kbuf;
> + ssize_t ret;
> +
> + /* We do not accept continuous write */
> + if (*ppos)
> + return ERR_PTR(-EINVAL);
> +
> + kbuf = kmalloc(count + 1, GFP_KERNEL);
> + if (!kbuf)
> + return ERR_PTR(-ENOMEM);
> +
> + ret = simple_write_to_buffer(kbuf, count + 1, ppos, buf, count);
> + if (ret != count) {
> + kfree(kbuf);
> + return ERR_PTR(-EIO);
> + }
> + kbuf[ret] = '\0';
> +
> + return kbuf;
> +}
> +
> +static ssize_t dbgfs_attrs_read(struct file *file,
> + char __user *buf, size_t count, loff_t *ppos)
> +{
> + struct damon_ctx *ctx = file->private_data;
> + char kbuf[128];
> + int ret;
> +
> + mutex_lock(&ctx->kdamond_lock);
> + ret = scnprintf(kbuf, ARRAY_SIZE(kbuf), "%lu %lu %lu %lu %lu\n",
> + ctx->sample_interval, ctx->aggr_interval,
> + ctx->regions_update_interval, ctx->min_nr_regions,
> + ctx->max_nr_regions);
> + mutex_unlock(&ctx->kdamond_lock);
> +
> + return simple_read_from_buffer(buf, count, ppos, kbuf, ret);
> +}
> +
> +static ssize_t dbgfs_attrs_write(struct file *file,
> + const char __user *buf, size_t count, loff_t *ppos)
> +{
> + struct damon_ctx *ctx = file->private_data;
> + unsigned long s, a, r, minr, maxr;
> + char *kbuf;
> + ssize_t ret = count;
> + int err;
> +
> + kbuf = user_input_str(buf, count, ppos);
> + if (IS_ERR(kbuf))
> + return PTR_ERR(kbuf);
> +
> + if (sscanf(kbuf, "%lu %lu %lu %lu %lu",
> + &s, &a, &r, &minr, &maxr) != 5) {
> + ret = -EINVAL;
> + goto out;
> + }
> +
> + mutex_lock(&ctx->kdamond_lock);
> + if (ctx->kdamond) {
> + ret = -EBUSY;
> + goto unlock_out;
> + }
> +
> + err = damon_set_attrs(ctx, s, a, r, minr, maxr);
> + if (err)
> + ret = err;
> +unlock_out:
> + mutex_unlock(&ctx->kdamond_lock);
> +out:
> + kfree(kbuf);
> + return ret;
> +}
> +
> +#define targetid_is_pid(ctx) \
> + (ctx->primitive.target_valid == damon_va_target_valid)
> +
> +static ssize_t sprint_target_ids(struct damon_ctx *ctx, char *buf, ssize_t len)
> +{
> + struct damon_target *t;
> + unsigned long id;
> + int written = 0;
> + int rc;
> +
> + damon_for_each_target(t, ctx) {
> + id = t->id;
> + if (targetid_is_pid(ctx))
> + /* Show pid numbers to debugfs users */
> + id = (unsigned long)pid_vnr((struct pid *)id);
> +
> + rc = scnprintf(&buf[written], len - written, "%lu ", id);
> + if (!rc)
> + return -ENOMEM;
> + written += rc;
> + }
> + if (written)
> + written -= 1;
> + written += scnprintf(&buf[written], len - written, "\n");
> + return written;
> +}
> +
> +static ssize_t dbgfs_target_ids_read(struct file *file,
> + char __user *buf, size_t count, loff_t *ppos)
> +{
> + struct damon_ctx *ctx = file->private_data;
> + ssize_t len;
> + char ids_buf[320];
> +
> + mutex_lock(&ctx->kdamond_lock);
> + len = sprint_target_ids(ctx, ids_buf, 320);
> + mutex_unlock(&ctx->kdamond_lock);
> + if (len < 0)
> + return len;
> +
> + return simple_read_from_buffer(buf, count, ppos, ids_buf, len);
> +}
> +
> +/*
> + * Converts a string into an array of unsigned long integers
> + *
> + * Returns an array of unsigned long integers if the conversion success, or
> + * NULL otherwise.
> + */
> +static unsigned long *str_to_target_ids(const char *str, ssize_t len,
> + ssize_t *nr_ids)
> +{
> + unsigned long *ids;
> + const int max_nr_ids = 32;
> + unsigned long id;
> + int pos = 0, parsed, ret;
> +
> + *nr_ids = 0;
> + ids = kmalloc_array(max_nr_ids, sizeof(id), GFP_KERNEL);
> + if (!ids)
> + return NULL;
> + while (*nr_ids < max_nr_ids && pos < len) {
> + ret = sscanf(&str[pos], "%lu%n", &id, &parsed);
> + pos += parsed;
> + if (ret != 1)
> + break;
> + ids[*nr_ids] = id;
> + *nr_ids += 1;
> + }
> +
> + return ids;
> +}
> +
> +/* Returns pid for the given pidfd if it's valid, or NULL otherwise. */
> +static struct pid *damon_get_pidfd_pid(unsigned int pidfd)
> +{
> + struct fd f;
> + struct pid *pid;
> +
> + f = fdget(pidfd);
> + if (!f.file)
> + return NULL;
> +
> + pid = pidfd_pid(f.file);
> + if (!IS_ERR(pid))
> + get_pid(pid);
> + else
> + pid = NULL;
> +
> + fdput(f);
> + return pid;
> +}
> +
> +static ssize_t dbgfs_target_ids_write(struct file *file,
> + const char __user *buf, size_t count, loff_t *ppos)
> +{
> + struct damon_ctx *ctx = file->private_data;
> + char *kbuf, *nrs;
> + bool received_pidfds = false;
> + unsigned long *targets;
> + ssize_t nr_targets;
> + ssize_t ret = count;
> + int i;
> + int err;
> +
> + kbuf = user_input_str(buf, count, ppos);
> + if (IS_ERR(kbuf))
> + return PTR_ERR(kbuf);
> +
> + nrs = kbuf;
> +
> + if (!strncmp(kbuf, "pidfd ", 6)) {
> + received_pidfds = true;

I am inclining towards having simple pids instead of pidfds. Basically
what cgroup/resctrl does.


> + nrs = &kbuf[6];
> + }
> +
> + targets = str_to_target_ids(nrs, ret, &nr_targets);
> + if (!targets) {
> + ret = -ENOMEM;
> + goto out;
> + }
> +
> + if (received_pidfds) {
> + for (i = 0; i < nr_targets; i++)
> + targets[i] = (unsigned long)damon_get_pidfd_pid(
> + (unsigned int)targets[i]);
> + } else if (targetid_is_pid(ctx)) {
> + for (i = 0; i < nr_targets; i++)
> + targets[i] = (unsigned long)find_get_pid(
> + (int)targets[i]);
> + }
> +
> + mutex_lock(&ctx->kdamond_lock);
> + if (ctx->kdamond) {
> + ret = -EINVAL;
> + goto unlock_out;
> + }
> +
> + err = damon_set_targets(ctx, targets, nr_targets);

Hmm this is leaking the references to the previous targets.

> + if (err)
> + ret = err;
> +unlock_out:
> + mutex_unlock(&ctx->kdamond_lock);
> + kfree(targets);
> +out:
> + kfree(kbuf);
> + return ret;
> +}
> +

Still looking.