Re: [PATCH 5.17 127/298] driver core: Fix wait_for_device_probe() & deferred_probe_timeout interaction

From: Greg Kroah-Hartman
Date: Wed Aug 16 2023 - 11:04:29 EST


On Wed, Aug 16, 2023 at 03:09:27PM +0530, Shreeya Patel wrote:
> On 13/06/22 15:40, Greg Kroah-Hartman wrote:
> > From: Saravana Kannan<saravanak@xxxxxxxxxx>
> >
> > [ Upstream commit 5ee76c256e928455212ab759c51d198fedbe7523 ]
> >
> > Mounting NFS rootfs was timing out when deferred_probe_timeout was
> > non-zero [1]. This was because ip_auto_config() initcall times out
> > waiting for the network interfaces to show up when
> > deferred_probe_timeout was non-zero. While ip_auto_config() calls
> > wait_for_device_probe() to make sure any currently running deferred
> > probe work or asynchronous probe finishes, that wasn't sufficient to
> > account for devices being deferred until deferred_probe_timeout.
> >
> > Commit 35a672363ab3 ("driver core: Ensure wait_for_device_probe() waits
> > until the deferred_probe_timeout fires") tried to fix that by making
> > sure wait_for_device_probe() waits for deferred_probe_timeout to expire
> > before returning.
> >
> > However, if wait_for_device_probe() is called from the kernel_init()
> > context:
> >
> > - Before deferred_probe_initcall() [2], it causes the boot process to
> > hang due to a deadlock.
> >
> > - After deferred_probe_initcall() [3], it blocks kernel_init() from
> > continuing till deferred_probe_timeout expires and beats the point of
> > deferred_probe_timeout that's trying to wait for userspace to load
> > modules.
> >
> > Neither of this is good. So revert the changes to
> > wait_for_device_probe().
> >
> > [1] -https://lore.kernel.org/lkml/TYAPR01MB45443DF63B9EF29054F7C41FD8C60@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/
> > [2] -https://lore.kernel.org/lkml/YowHNo4sBjr9ijZr@dev-arch.thelio-3990X/
> > [3] -https://lore.kernel.org/lkml/Yo3WvGnNk3LvLb7R@xxxxxxxxxxxxx/
>
> Hi Saravana, Greg,
>
>
> KernelCI found this patch causes the baseline.bootrr.deferred-probe-empty test to fail on r8a77960-ulcb,
> see the following details for more information.
>
> KernelCI dashboard link:
> https://linux.kernelci.org/test/plan/id/64d2a6be8c1a8435e535b264/
>
> Error messages from the logs :-
>
> + UUID=11236495_1.5.2.4.5
> + set +x
> + export 'PATH=/opt/bootrr/libexec/bootrr/helpers:/lava-11236495/1/../bin:/sbin:/usr/sbin:/bin:/usr/bin'
> + cd /opt/bootrr/libexec/bootrr
> + sh helpers/bootrr-auto
> e6800000.ethernet
> e6700000.dma-controller
> e7300000.dma-controller
> e7310000.dma-controller
> ec700000.dma-controller
> ec720000.dma-controller
> fea20000.vsp
> feb00000.display
> fea28000.vsp
> fea30000.vsp
> fe9a0000.vsp
> fe9af000.fcp
> fea27000.fcp
> fea2f000.fcp
> fea37000.fcp
> sound
> ee100000.mmc
> ee140000.mmc
> ec500000.sound
> /lava-11236495/1/../bin/lava-test-case
> <8>[ 17.476741] <LAVA_SIGNAL_TESTCASE TEST_CASE_ID=deferred-probe-empty RESULT=fail>
>
> Test case failing :-
> Baseline Bootrr deferred-probe-empty test -https://github.com/kernelci/bootrr/blob/main/helpers/bootrr-generic-tests
>
> Regression Reproduced :-
>
> Lava job after reverting the commit 5ee76c256e92
> https://lava.collabora.dev/scheduler/job/11292890
>
>
> Bisection report from KernelCI can be found at the bottom of the email.
>
> Thanks,
> Shreeya Patel
>
> #regzbot introduced: 5ee76c256e92
> #regzbot title: KernelCI: Multiple devices deferring on r8a77960-ulcb
>
> ---------------------------------------------------------------------------------------------------------------------------------------------------
>
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * **
> * If you do send a fix, please include this trailer: *
> * Reported-by: "kernelci.org bot" <bot@...> *
> * *
> * Hope this helps! *
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
>
> stable-rc/linux-5.10.y bisection: baseline.bootrr.deferred-probe-empty on
> r8a77960-ulcb

You are testing 5.10.y, yet the subject says 5.17?

Which is it here?

confused,

greg k-h