Re: kmod: add a sanity check on module loading

From: Jessica Yu
Date: Fri Jan 06 2017 - 16:55:03 EST


+++ Luis R. Rodriguez [06/01/17 21:36 +0100]:
On Tue, Jan 03, 2017 at 10:34:53AM +1030, Rusty Russell wrote:
"Luis R. Rodriguez" <mcgrof@xxxxxxxxxx> writes:
> Right, out of ~350 request_module() calls (not included try requests)
> only ~46 check the return value. Hence a validation check, and come to
> think of it, *this* was the issue that originally had me believing
> that in some places we might end up in a null deref --if those open
> coded request_module() calls assume the driver is loaded there could
> be many places where a NULL is inevitable.

Yes, assuming success == module loade is simply a bug. I wrote
try_then_request_module() to attempt to encapsulate the correct logic
into a single place; maybe we need other helpers to cover (most of?) the
remaining cases?

I see...

OK so indeed we have a few possible changes to kernel given the above:

a) Add SmPL rule to nag about incorrect uses of request_module() which
never check for the return value, and fix 86% of calls (304 call sites)
which are buggy

b) Add a new API call, perhaps request_module_assert() which would
BUG_ON() if the requested module didn't load, and change the callers
which do not check for the return value to this.

It is probably not a good idea to panic/BUG() because a requested
module didn't load. IMO callers should already be accounting for the
fact that request_module() doesn't provide these guarantees. I haven't
looked yet to see if the majority of these callers actually do the the
responsible thing, though.

Make request_module() do the assert and changing all proper callers of
request_module() to a new API call which *does* let you check for the
return value is another option but tasteless.

b) seems to be what you allude to, and while it may seem also of bad taste,
in practice it may be hard to get callers to properly check for the return
value. I actually just favor a) even though its more work.

> Granted, I agree they
> should be fixed, we could add a grammar rule to start nagging at
> driver developers for started, but it does beg the question also of
> what a tightly knit validation for modprobe might look like, and hence
> this patch and now the completed not-yet-posted alias work.

I really think aliases-in-kernel is too heavy a hammer, but a warning
when modprobe "succeeds" and the module still isn't found would be
a Good Thing.

OK -- such a warning can really only happen if we had alias support though.
So one option is to add this and alias parsing support as a debug option.

Hm, I see what you're saying..

To clarify the problem (if anyone was confused, as I was..): we can
verify a module is loaded by using find_module_all() and looking at
its state. However, find_module_all() operates on real module names,
and we can't verify a module has successfully loaded if all we have is
the name of the alias (eg, "fs-*" aliases in get_fs_type), because we
have no alias->real_module_name mappings in the kernel.

However, in Rusty's sample get_fs_type WARN() code, we indirectly
validated request_module()'s work by verifying that the
file_system_type has actually registered, which is what should happen
if a filesystem module successfully loads. So in this case, the caller
(get_fs_type) indirectly checks if the service it requested is now
available, which is what I *thought* callers were supposed to do in
the first place (and we didn't need the help of aliases to do that).
I think the main question we have to answer is, should the burden of
validation be on the callers, or on request_module? I am currently
leaning towards the former, but I'm still thinking.

> Would it be worthy as a kconfig kmod debugging aide for now? I can
> follow up with a semantic patch to nag about checking the return value
> of request_module(), and we can have 0-day then also complain about
> new invalid uses.

Yeah, a warning about this would be win for sure.

OK will work on such SmPL patch into the next patch series for this patch set.

BTW, I wrote the original "check-for-module-before-loading" in
module-init-tools, but I'm starting to wonder if it was a premature
optimization. Have you thought about simply removing it and always
trying to load the module? If it doesn't slow things down, perhaps
simplicity FTW?

I've given this some thought as I tried to blow up request_module() with
the new kmod stress test driver and given the small changes I made -- I'm of the
mind set it should be based on numbers: if a change improves the time it takes
to load modules while also not regressing all the other test cases then we
should go with it. The only issue is we don't yet have enough test cases
to cover the typical distribution setup: load tons of modules, and only
sometimes try to load a few of the same modules.

The early module-init-tools check seems fair gain to me given a bounce back to
the kernel and back to userspace should incur a bit more work than just checking
for a few files on the filesystem. As I noted though, I can't prove this for most
cases for now, but its a hunch.

So I'd advocate leaving the "check-for-module-before-loading" on kmod for now.

Luis