Re: [PATCH] driver core: Disable driver deferred probe timeout by default

From: Greg Kroah-Hartman
Date: Mon Nov 14 2022 - 06:38:17 EST


On Mon, Nov 14, 2022 at 12:13:15PM +0100, Javier Martinez Canillas wrote:
> Hello Greg,
>
> Thanks a lot for your feedback.
>
> On 11/14/22 11:54, Greg Kroah-Hartman wrote:
>
> [...]
>
> >>
> >> This default value of 0 was reverted again by commit f516d01b9df2 ("Revert
> >> "driver core: Set default deferred_probe_timeout back to 0."") and set to
> >> 10 seconds instead. Which was still less than the 30 seconds that was set
> >> at some point to allow systems with drivers built as modules and loaded by
> >> user-land to probe drivers that were previously deferred.
> >>
> >> The 10 seconds timeout isn't enough for the mentioned systems, for example
> >> general purpose distributions attempt to build all the possible drivers as
> >> a module to keep the Linux kernel image small. But that means that in very
> >> likely that the probe deferral mechanism will timeout and drivers won't be
> >> probed correctly.
> >
> > What specific "mentioned systems" have deferred probe drivers that are
>
> The "mentioned systems" are the ones mentioned in the paragraph above:
>
> "to allow systems with drivers built as modules and loaded by user-land to
> probe drivers that were previously deferred."
>
> I even gave an example about general purpose distributions that build as
> much as possible as a module. What more info do you think that is missing?

Exact systems that this is failing on would be great to have.

> > failing on the current value? What drivers are causing the long delay
> > here? No one should be having to wait 10 seconds for a deferred delay
> > on a real system as that feels really wrong.
> >
>
> Not really, it depends if the drivers are built-in, built as modules, in
> the initramfs or in the rootfs. A 10 seconds might not be enough if these
> modules are in the root partition and need to wait for this to be mounted
> and udev to load the modules, etc.

How does it take 10 seconds to load the initramfs for a system that
requires deferred probe devices? What typs of hardware is this?

> Also, it may even be that the module alias is not correct and then users
> have to load them by explicitly have /etc/modules-load.d/ configs and so
> on.

Then that's a totally different issue and the module alias needs to be
fixed and is not relevant here.

> > Why not fix the drivers that are causing this delay and maybe move them
> > to be async so as to not slow down the whole boot process?
> >
>
> Yes, these drivers could be fixed to report a proper module alias or the
> dependencies can be built-in or added to the initramfs and that does not
> change the fact that by default the kernel shouldn't make assumptions
> about when is safe to just timeout instead of -EPROBE_DEFER.

Please let me know which drivers these are that are causing problems so
we can fix them.

> Because with modules the kernel has no way to know when all the modules
> have been already been loaded by user-space or more drivers are going to
> be registered in the future.

Of course that is true, so we guess, and so far, 10 seconds is a good
enough guess for normal systems out there that use deferred probe. What
exact system and drivers do not work with this today?

> Also, that's how probe deferral always worked since the mechanism was
> introduced. It's just recently that the behavior was changed to timeout.
>
> A nice feature of the probe deferral mechanism is that it was simple and
> reliable. Adding a timeout makes it non-deterministic and more fragile IMO.

deferred probe was never simple or reliable or determinisitic. It was a
hack we had to implement to handle complex hardware situations and
loadable drivers. Let's not try to paper over driver bugs here by
making the timeout "forever" but rather fix the root problem in the
broken drivers.

So, what drivers do we need to fix up?

thanks,

greg k-h