Re: [PATCH] module: add debugging auto-load duplicate module support

From: Lucas De Marchi
Date: Fri Apr 21 2023 - 12:43:19 EST


On Fri, Apr 21, 2023 at 05:12:51PM +0200, Greg KH wrote:
On Thu, Apr 20, 2023 at 02:03:32PM -0700, Luis Chamberlain wrote:
On Thu, Apr 20, 2023 at 07:32:10AM +0200, Greg KH wrote:
> On Wed, Apr 19, 2023 at 04:32:30PM -0700, Luis Chamberlain wrote:
> > > It's not "wasted", as it is returned when the module is determined to be
> > > a duplicate. Otherwise everyone will want this enabled as they think it
> > > will actually save memory.
> >
> > I'll change the language to be clear the issue is memory pressure early
> > on boot. I'll also add a bit of language to help at least guide people
> > to realize that the real value-add for this, ie, I'll have to mention we
> > suspect issue is udev and not module auto-loading and that this however
> > may still help find a few cases we can optimize for.
>
> This isn't udev's "problem", all it is doing is what the kernel asked it
> to do. The kernel said "Here's a new device I found, load a module for
> it please!"

If you believe that then the issue is still a kernel issue, and the
second part to that sentence "load a module for it" was done without
consideration of the implications, or without optimizations in mind.
Given the implications were perhaps not well understood it is unfair
for us to be hard on ourselves on that. But now we know, ideally if we
could we *should* only issue a request for a module *once* during boot.

But there is no mapping between devices and modules other than what is
exported in the module info and that is up to userspace to handle.

Where does the kernel actually say "load a module"?

The driver core says "hey a new device is now present!"

Userspace takes that message and calls kmod with the device information
which then determines what module to load by looking at the device
aliases.

Isn't that just an implied gesture?

Yes.

> And it's the kmod code here, not udev itself doing all of this.

Yes, IMHO kmod could and *should* be enhanced to share a loading context
during boot so to avoid duplicates too and then udev would have to
embrace such functionality. That's going to take time to propagate, as
you can imagine.

udev is just the transport to kmod here, it's not in the job of
filtering duplicate messages.

udev nowadays use *lib*kmod. It's udev who has the
context it can operate on.

Also, those module loads will not use the path this patch is changing
call_modprobe is not the path that triggers udev to load modules.
/me confused

What can be done from userspace in the udev path

1) udev to do the ratelimit'ing. Define a time window,
filter out uevents in systemd/src/udev/udev-builtin-kmod.c

2) libkmod to do the ratelimit'ing with a similar approach, but udev
needs to tell libkmod what is the window it wants to use

3) libkmod to act on the context it has from the *kernel*. It used
to be cheap with the call simply blocking early on the syscall in
a mutex... or we didn't have that many calls. So libkmod
simply calls [f]init_module() again regardless of the module's
state being in a "coming" state. Is this the case here? I haven't
seen this data. This is done to avoid a) the toctou implied and b) to
provide the correct return for that call - libkmod can't know if the
previous call will succeed or fail.

libkmod only skips the call if the module is already in
the live state. It seems systemd-udev also duplicates the check
in src/shared/module-util.c:module_load_and_warn()

Note that libkmod already spares loading the module multiple times from
disk as it uses a memory pool for the modules. It reuses one iff it
comes from the same context (i.e. it's only udev involved and not a
bunch of parallel calls to modprobe).

4) If all the calls are coming from the same context and it is udev...
I'm not sure this is actually the problem - the udev's kmod builtin
handler is single-threaded and will handle one request at a time.
I don't see any data to confirm it's coming from a single source or
multiple sources. Could you get a trace containing [f]init_module and
the trace_module_request(), together with a verbose udev log?

If this is all coming from a synthetic use case with thousands of
modprobe execs, I'm not sure there is much to do on the userspace side.


> Why not
> just rate-limit it in userspace if your system can't handle 10's of
> thousands of kmod calls all at once? I think many s390 systems did this
> decades ago when they were controlling 10's of thousands of scsi devices
> and were hit with "device detection storms" at boot like this.

Boot is a special context and in this particular case I agree userspace
kmod could/should be extended to avoid duplicate module requests in that

see above

context. But likewise the kernel should only have to try to issue a
request for a single module once, if it could easily do that.

Are you sure that this is happening at boot in a way that userspace
didn't just trigger it on its own after init started up? That happens
as a "coldboot" walk of the device tree and all uevent are regenerated.
That is userspace asking for this, so there's nothing that the kernel
can do.

This does beg the question, why force userspace to rate limit if we
can do better in the kernel? Specially if *we're the ones*, as you say,
that are hinting to userspace to shoot back loading modules for us and we
know we're just going to drop duplicates?

Maybe error out of duplicate module loading earlier? I don't know,
sorry.

I still don't see what's the source of the problem from the data
available. Is the kernel issuing multiple request_module()? Or is the
kernel sending multiple udev event for userspace to map the alias to the
module and load it? The mapping alias -> module currently belongs in
userspace so if you are de-duplicating, it can't be only on the module
name.


> What specific devices and bus types are the problem here for these systems?

My best assessment of the situation is that each CPU in udev ends up triggering
a load of duplicate set of modules, not just one, but *a lot*. Not sure
what heuristics udev uses to load a set of modules per CPU.

Again, finding which device and bus is causing the problem is going to
be key here to try to solve the issue. Are you logging duplicate module

agreed.

If the info I requested above is available on other threads, could you
point me to those?

thanks
Lucas De Marchi

loads by name as well?

thanks,

greg k-h