Re: [PATCH] module: add debugging auto-load duplicate module support

From: Greg KH
Date: Wed Apr 19 2023 - 03:15:22 EST


On Tue, Apr 18, 2023 at 01:46:36PM -0700, Luis Chamberlain wrote:
> This adds debugging support to the kernel module auto-loader to
> easily detect and deal with duplicate module requests. To aid
> with possible bootup failure issues it will supress the waste
> in virtual memory when races happen before userspace loads a
> module and the kernel is still issuing requests for the same
> module.
>
> Folks debugging virtual memory abuse on bootup can and should
> enable this to see what WARN()s come on, to see if module
> auto-loading is to blame for their woes.

You get 72 columns for changelog text, so you can use it :)

>
> Signed-off-by: Luis Chamberlain <mcgrof@xxxxxxxxxx>
> ---
>
> Changes on this patch since the last RFC:
>
> o dropped the kernel_read*() patch from this series moving to
> punt the issues as a udev issue now that we have proof auto-loading
> is not the issue
> o some spell checks
>
> kernel/module/Kconfig | 40 +++++++
> kernel/module/Makefile | 1 +
> kernel/module/dups.c | 234 +++++++++++++++++++++++++++++++++++++++
> kernel/module/internal.h | 15 +++
> kernel/module/kmod.c | 23 +++-
> 5 files changed, 309 insertions(+), 4 deletions(-)
> create mode 100644 kernel/module/dups.c
>
> diff --git a/kernel/module/Kconfig b/kernel/module/Kconfig
> index e6df183e2c80..cc146ef4a6ac 100644
> --- a/kernel/module/Kconfig
> +++ b/kernel/module/Kconfig
> @@ -59,6 +59,46 @@ config MODULE_STATS
>
> If unsure, say N.
>
> +config MODULE_AUTOLOAD_SUPRESS_DUPS

MODULE_DEBUG_DUPLICATE perhaps? It has nothing to do with autoloading
(other than that is what userspace is doing) and you aren't suppressing
anything except throwing up warnings, right?

> + bool "Debug duplicate modules with auto-loading"
> + help
> + Module autoloading allows in-kernel code to request modules through
> + the *request_module*() API calls. This in turn just calls userspace
> + modprobe. Although modprobe checks to see if a module is already
> + loaded before trying to load a module there is a small time window in
> + which multiple duplicate requests can end up in userspace and multiple
> + modprobe calls race calling finit_module() around the same time for
> + duplicate modules. The finit_module() system call can consume in the
> + worst case more than twice the respective module size in virtual
> + memory for each duplicate module requests. Although duplicate module
> + requests are non-fatal virtual memory is a limited resource and each
> + duplicate module request ends up just wasting virtual memory.

It's not "wasted", as it is returned when the module is determined to be
a duplicate. Otherwise everyone will want this enabled as they think it
will actually save memory.

> + This debugging facility will create WARN() splats for duplicate module
> + requests to help identify if module auto-loading is the culprit to your
> + woes. Since virtual memory abuse caused by duplicate module requests
> + could render a system unusable this functionality will also suppresses
> + the waste in virtual memory caused by duplicate requests by sharing
> + races in requests for the same module to a single unified request.
> + Once a non-wait request_module() call completes a module should be
> + loaded and modprobe should simply not allow new finit_module() calls.
> +
> + Enable this functionality to try to debug virtual memory abuse during
> + boot on systems and identify if the abuse was due to module
> + auto-loading.
> +
> + If the first module request used request_module_nowait() we cannot
> + use that as the anchor to wait for duplicate module requests, since
> + users of request_module() do want a proper return value. If a call
> + for the same module happened earlier with request_module() though,
> + then a duplicate request_module_nowait() would be detected.
> +
> + You want to enable this if you want to debug and see if duplicate
> + module auto-loading might be causing virtual memory abuse during
> + bootup. A kernel trace will be provided for each duplicate request.
> +
> + Disable this if you are on production.

"on production"? That does not translate well, how about:
Only enable this for debugging system functionality, never have
it enabled on real systems.
or something like that?


> +
> endif # MODULE_DEBUG
>
> config MODULE_FORCE_LOAD
> diff --git a/kernel/module/Makefile b/kernel/module/Makefile
> index 52340bce497e..e8b121ac39cf 100644
> --- a/kernel/module/Makefile
> +++ b/kernel/module/Makefile
> @@ -10,6 +10,7 @@ KCOV_INSTRUMENT_module.o := n
> obj-y += main.o
> obj-y += strict_rwx.o
> obj-y += kmod.o
> +obj-$(CONFIG_MODULE_AUTOLOAD_SUPRESS_DUPS) += dups.o
> obj-$(CONFIG_MODULE_DECOMPRESS) += decompress.o
> obj-$(CONFIG_MODULE_SIG) += signing.o
> obj-$(CONFIG_LIVEPATCH) += livepatch.o
> diff --git a/kernel/module/dups.c b/kernel/module/dups.c
> new file mode 100644
> index 000000000000..903ab7c7e8f4
> --- /dev/null
> +++ b/kernel/module/dups.c
> @@ -0,0 +1,234 @@
> +/* SPDX-License-Identifier: GPL-2.0-or-later */
> +/*
> + * kmod dups - the kernel module autoloader duplicate suppressor
> + *
> + * Copyright (C) 2023 Luis Chamberlain <mcgrof@xxxxxxxxxx>
> + */
> +
> +#include <linux/module.h>
> +#include <linux/sched.h>
> +#include <linux/sched/task.h>
> +#include <linux/binfmts.h>
> +#include <linux/syscalls.h>
> +#include <linux/unistd.h>
> +#include <linux/kmod.h>
> +#include <linux/slab.h>
> +#include <linux/completion.h>
> +#include <linux/cred.h>
> +#include <linux/file.h>
> +#include <linux/fdtable.h>
> +#include <linux/workqueue.h>
> +#include <linux/security.h>
> +#include <linux/mount.h>
> +#include <linux/kernel.h>
> +#include <linux/init.h>
> +#include <linux/resource.h>
> +#include <linux/notifier.h>
> +#include <linux/suspend.h>
> +#include <linux/rwsem.h>
> +#include <linux/ptrace.h>
> +#include <linux/async.h>
> +#include <linux/uaccess.h>
> +
> +DEFINE_MUTEX(kmod_dup_mutex);

static?

> +static LIST_HEAD(dup_kmod_reqs);
> +
> +struct kmod_dup_req {
> + struct list_head list;
> + char name[MODULE_NAME_LEN];
> + struct completion first_req_done;
> + struct work_struct complete_work;
> + struct delayed_work delete_work;
> + int dup_ret;
> +};
> +
> +static struct kmod_dup_req *kmod_dup_request_lookup(char *module_name)

Lock requirements? You should document that here.

> +{
> + struct kmod_dup_req *kmod_req;
> +
> + list_for_each_entry_rcu(kmod_req, &dup_kmod_reqs, list,
> + lockdep_is_held(&kmod_dup_mutex)) {
> + if (strlen(kmod_req->name) == strlen(module_name) &&
> + !memcmp(kmod_req->name, module_name, strlen(module_name))) {
> + return kmod_req;
> + }
> + }
> +
> + return NULL;
> +}
> +
> +static void kmod_dup_request_delete(struct work_struct *work)
> +{
> + struct kmod_dup_req *kmod_req;
> + kmod_req = container_of(to_delayed_work(work), struct kmod_dup_req, delete_work);
> +
> + /*
> + * The typical situation is a module successully loaded. In that
> + * situation the module will be present already in userspace. If
> + * new requests come in after that, userspace will already know the
> + * module is loaded so will just return 0 right away. There is still
> + * a small chance right after we delete this entry new request_module()
> + * calls may happen after that, they can happen. These heuristics
> + * are to protect finit_module() abuse for auto-loading, if modules
> + * are still tryign to auto-load even if a module is already loaded,
> + * that's on them, and those inneficiencies should not be fixed by
> + * kmod. The inneficies there are a call to modprobe and modprobe
> + * just returning 0.
> + */
> + mutex_lock(&kmod_dup_mutex);
> + list_del_rcu(&kmod_req->list);
> + synchronize_rcu();
> + mutex_unlock(&kmod_dup_mutex);
> + kfree(kmod_req);
> +}
> +
> +static void kmod_dup_request_complete(struct work_struct *work)
> +{
> + struct kmod_dup_req *kmod_req;
> +
> + kmod_req = container_of(work, struct kmod_dup_req, complete_work);
> +
> + /*
> + * This will ensure that the kernel will let all the waiters get
> + * informed its time to check the return value. It's time to
> + * go home.
> + */
> + complete_all(&kmod_req->first_req_done);
> +
> + /*
> + * Now that we have allowed prior request_module() calls to go on
> + * with life, let's schedule deleting this entry. We don't have
> + * to do it right away, but we *eventually* want to do it so to not
> + * let this linger forever as this is just a boot optimization for
> + * possible abuses of vmalloc() incurred by finit_module() thrashing.
> + */
> + queue_delayed_work(system_wq, &kmod_req->delete_work, 60 * HZ);
> +}
> +
> +bool kmod_dup_request_exists_wait(char *module_name, bool wait, int *dup_ret)
> +{
> + struct kmod_dup_req *kmod_req, *new_kmod_req;
> + int ret;
> +
> + /*
> + * Pre-allocate the entry in case we have to use it later
> + * to avoid contention with the mutex.
> + */
> + new_kmod_req = kzalloc(sizeof(*new_kmod_req), GFP_KERNEL);
> + if (!new_kmod_req)
> + return false;
> +
> + memcpy(new_kmod_req->name, module_name, strlen(module_name));
> + INIT_WORK(&new_kmod_req->complete_work, kmod_dup_request_complete);
> + INIT_DELAYED_WORK(&new_kmod_req->delete_work, kmod_dup_request_delete);
> + init_completion(&new_kmod_req->first_req_done);
> +
> + mutex_lock(&kmod_dup_mutex);
> +
> + kmod_req = kmod_dup_request_lookup(module_name);
> + if (!kmod_req) {
> + /*
> + * If the first request that came through for a module
> + * was with request_module_nowait() we cannot wait for it
> + * and share its return value with other users which may
> + * have used request_module() and need a proper return value
> + * so just skip using them as an anchor.
> + *
> + * If a prior request to this one came through with
> + * request_module() though, then a request_module_nowait()
> + * would benefit from duplicate detection.
> + */
> + if (!wait) {
> + kfree(new_kmod_req);
> + pr_warn("New request_module_nowait() for %s -- cannot track duplicates for this request\n", module_name);

pr_dbg() as this is debugging?

And did you set your pr_fmt value to make it obvious where these
messages are coming from?

> + mutex_unlock(&kmod_dup_mutex);
> + return false;
> + }
> +
> + /*
> + * There was no duplicate, just add the request so we can
> + * keep tab on duplicates later.
> + */
> + pr_info("New request_module() for %s\n", module_name);

Why not pr_dbg()?

> + list_add_rcu(&new_kmod_req->list, &dup_kmod_reqs);
> + mutex_unlock(&kmod_dup_mutex);
> + return false;
> + }
> + mutex_unlock(&kmod_dup_mutex);
> +
> + /* We are dealing with a duplicate request now */
> +

No blank line needed.

> + kfree(new_kmod_req);
> +
> + /*
> + * To fix these try to use try_then_request_module() instead as that
> + * will check if the component you are looking for is present or not.
> + * You could also just queue a single request to load the module once,
> + * instead of having each and everything you need try to request for
> + * the module.
> + *
> + * Duplicate request_module() calls can cause quite a bit of wasted
> + * vmalloc() space when racing with userspace.
> + */
> + WARN(1, "module-autoload: duplicate request for module %s\n", module_name);

Remember, this will trigger syzbot and any system that has
panic-on-warn, so be prepared for that mess. Especially the syzbot
reports, that's going to cause lots of issues for you if you don't tell
them to always disable this option.

thanks,

greg k-h