Re: [PATCH] module: print module name on refcount error

From: Jean Delvare
Date: Tue Jul 04 2023 - 08:43:21 EST


Hi Michal,

On Wed, 28 Jun 2023 12:30:35 +0200, Michal Hocko wrote:
> On Mon 26-06-23 12:32:52, Jean Delvare wrote:
> > If module_put() triggers a refcount error, include the culprit
> > module name in the warning message, to easy further investigation of
> > the issue.
> >
> > Signed-off-by: Jean Delvare <jdelvare@xxxxxxx>
> > Suggested-by: Michal Hocko <mhocko@xxxxxxxx>
> > Cc: Luis Chamberlain <mcgrof@xxxxxxxxxx>
> > ---
> > kernel/module/main.c | 4 +++-
> > 1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > --- linux-6.3.orig/kernel/module/main.c
> > +++ linux-6.3/kernel/module/main.c
> > @@ -850,7 +850,9 @@ void module_put(struct module *module)
> > if (module) {
> > preempt_disable();
> > ret = atomic_dec_if_positive(&module->refcnt);
> > - WARN_ON(ret < 0); /* Failed to put refcount */
> > + WARN(ret < 0,
> > + KERN_WARNING "Failed to put refcount for module %s\n",
> > + module->name);
>
> Would it make sense to also print the refcnt here? In our internal bug
> report it has turned out that this was an overflow (put missing) rather
> than an underflow (too many put calls). Seeing the value could give a
> clue about that. We had to configure panic_on_warn to capture a dump to
> learn more which is rather impractical.

Well, other calls to module_put() or try_module_get() could happen in
parallel, so at the time we print refcnt, its value could be different
from the one which triggered the WARN.

Additionally, catching an overflow in module_put() is counterintuitive,
it only works by accident because the counter gets to negative values.
If we really want to reliably report overflows as such then we should
add a dedicated WARN to try_module_get(). Doesn't look trivial though.

With my proposed implementation, I don't think it's necessary to turn
on panic_on_warn to debug further. Once you know which module is
culprit, enabling tracing for this specific module should give you all
the details you need to figure out what's going on.

--
Jean Delvare
SUSE L3 Support