Re: [PATCH v3] driver core: Break infinite loop when deferred probe can't be satisfied

From: Grant Likely
Date: Thu Mar 26 2020 - 09:48:50 EST




On 26/03/2020 11:57, Andy Shevchenko wrote:
On Thu, Mar 26, 2020 at 09:39:40AM +0100, Rafael J. Wysocki wrote:
On Wed, Mar 25, 2020 at 11:09 PM Saravana Kannan <saravanak@xxxxxxxxxx> wrote:
On Wed, Mar 25, 2020 at 5:51 AM Andy Shevchenko
<andriy.shevchenko@xxxxxxxxxxxxxxx> wrote:

Yes, it's (unlikely) possible (*), but it will give one more iteration per such
case. It's definitely better than infinite loop. Do you agree?

Sorry I wasn't being clear (I was in a rush). I'm saying this patch
can reintroduce the bug where the deferred probe isn't triggered when
it should be.

Let's take a simple execution flow.

probe_okay is at 10.

Thread-A
really_probe(Device-A)
local_probe_okay_count = 10
Device-A probe function is running...

Thread-B
really_probe(Device-B)
Device-B probes successfully.
probe_okay incremented to 11

Thread-C
Device-C (which had bound earlier) is unbound (say module is
unloaded or a million other reasons).
probe_okay is decremented to 10.

Thread-A continues
Device-A probe function returns -EPROBE_DEFER
driver_deferred_probe_add_trigger() doesn't do anything because
local_probe_okay_count == probe_okay
But Device-A might have deferred probe waiting on Device-B.
Device-A never probes.

*) It means during probe you have _intensive_ removing, of course you may keep
kernel busy with iterations, but it has no practical sense. DoS attacks more
effective in different ways.

I wasn't worried about DoS attacks. More of a functional correctness
issue what I explained above.

The code is functionally incorrect as is already AFAICS.

Anyway, if your issue and similar issues can be handles in driver core
in a clean way without breaking other cases, I don't have any problem
with that. Just that, I think the current solution breaks other cases.

OK, so the situation right now is that commit 58b116bce136 has
introduced a regression and so it needs to be fixed or reverted. The
cases that were previously broken and were unbroken by that commit
don't matter here, so you cannot argue that they would be "broken".

It looks to me like the original issue fixed by the commit in question
needs to be addressed differently, so I would vote for reverting it
and starting over.

I think Saravana's example is not fully correct as I had responded to his mail.
I would like to hear Grant, but seems he is busy with something and didn't reply.

Sadly I don't look much like a kernel developer these days. The last code change I committed to the kernel was over 4 years ago.

g.