Re: [RESEND] drivercore: deferral race condition fix

From: Grant Likely
Date: Tue Apr 08 2014 - 08:47:54 EST


On Tue, Apr 8, 2014 at 3:27 AM, Grant Likely <grant.likely@xxxxxxxxxxxx> wrote:
> On Thu, 3 Apr 2014 10:40:59 +0100, Mark Brown <broonie@xxxxxxxxxx> wrote:
>> On Thu, Apr 03, 2014 at 10:12:07AM +0300, Peter Ujfalusi wrote:
>> > When the kernel is built with CONFIG_PREEMPT it is possible to reach a state
>> > when all modules loaded but some driver still stuck in the deferred list
>> > and there is a need for external event to kick the deferred queue to probe
>> > these drivers.
>>
>> Acked-by: Mark Brown <broonie@xxxxxxxxxx>
>
> It's a pretty crude solution though. The problem is any "in-flight"
> probes that are going to defer will not get added to the active list.
> Rerunning the entire active list is a bit much (but it does have the
> advantage of still being conceptually simple). I think we can do better.
>
> Instead of running the entire list, we could add a check to
> driver_deferred_probe_add() that adds the device to the active list
> instead of pending list on the condition that another driver probe
> completed while the deferred probe was in-flight.
>
> I'm playing with a solution now. I'll email a proposal shortly.

Thinking out loud now...

The race can occur whenever a probe in another thread completes
successfully while the current probe is in-flight. If that has
happened, then the defer condition may be resolved and the driver
should be scheduled for retry immediately. If the core code can check
for that condition, then we can add the driver directly to the active
list and kick the workqueue.

The problem is that we don't currently have an easy way to test if a
probe has completed in another thread. This patch handles it with a
single flag that gets set whenever a probe completes while another
probe is executing. I was worried that this approach would be racy,
but after running through the scenarios I can't find a situation where
it wouldn't get added. I only concern I have remaining on this
approach is that it will trigger unnecessary retries, but even that
isn't really a problem because the pending list will have been moved
to the active list *anyway*. It isn't even a retry of the whole list
that's happening because most likely the only device on the pending
list will be the one that completed with -EPROBE_DEFER.

So, I actually think this is the right approach now. I'll reply to the
patch itself and make some comments on the code.

g.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/