Re: [PATCH v3] driver core: Break infinite loop when deferred probe can't be satisfied

From: Andy Shevchenko
Date: Thu Mar 26 2020 - 07:57:34 EST


On Thu, Mar 26, 2020 at 09:39:40AM +0100, Rafael J. Wysocki wrote:
> On Wed, Mar 25, 2020 at 11:09 PM Saravana Kannan <saravanak@xxxxxxxxxx> wrote:
> > On Wed, Mar 25, 2020 at 5:51 AM Andy Shevchenko
> > <andriy.shevchenko@xxxxxxxxxxxxxxx> wrote:

> > > Yes, it's (unlikely) possible (*), but it will give one more iteration per such
> > > case. It's definitely better than infinite loop. Do you agree?
> >
> > Sorry I wasn't being clear (I was in a rush). I'm saying this patch
> > can reintroduce the bug where the deferred probe isn't triggered when
> > it should be.
> >
> > Let's take a simple execution flow.
> >
> > probe_okay is at 10.
> >
> > Thread-A
> > really_probe(Device-A)
> > local_probe_okay_count = 10
> > Device-A probe function is running...
> >
> > Thread-B
> > really_probe(Device-B)
> > Device-B probes successfully.
> > probe_okay incremented to 11
> >
> > Thread-C
> > Device-C (which had bound earlier) is unbound (say module is
> > unloaded or a million other reasons).
> > probe_okay is decremented to 10.
> >
> > Thread-A continues
> > Device-A probe function returns -EPROBE_DEFER
> > driver_deferred_probe_add_trigger() doesn't do anything because
> > local_probe_okay_count == probe_okay
> > But Device-A might have deferred probe waiting on Device-B.
> > Device-A never probes.
> >
> > > *) It means during probe you have _intensive_ removing, of course you may keep
> > > kernel busy with iterations, but it has no practical sense. DoS attacks more
> > > effective in different ways.
> >
> > I wasn't worried about DoS attacks. More of a functional correctness
> > issue what I explained above.
>
> The code is functionally incorrect as is already AFAICS.
>
> > Anyway, if your issue and similar issues can be handles in driver core
> > in a clean way without breaking other cases, I don't have any problem
> > with that. Just that, I think the current solution breaks other cases.
>
> OK, so the situation right now is that commit 58b116bce136 has
> introduced a regression and so it needs to be fixed or reverted. The
> cases that were previously broken and were unbroken by that commit
> don't matter here, so you cannot argue that they would be "broken".
>
> It looks to me like the original issue fixed by the commit in question
> needs to be addressed differently, so I would vote for reverting it
> and starting over.

I think Saravana's example is not fully correct as I had responded to his mail.
I would like to hear Grant, but seems he is busy with something and didn't reply.

> > As an alternate solution, assuming "linux,extcon-name" is coming
> > from some firmware, you might want to look into the fw_devlink
> > feature.
>
> That would be a workaround for a driver core issue, though, wouldn't it?

As I explained to him, this issue is not limited to USB case.

--
With Best Regards,
Andy Shevchenko