Re: [PATCH] VFS: br_write_lock locks on possible CPUs other thanonline CPUs

From: Srivatsa S. Bhat
Date: Tue Dec 20 2011 - 06:08:31 EST


On 12/20/2011 04:06 PM, Srivatsa S. Bhat wrote:

> On 12/20/2011 03:07 PM, mengcong wrote:
>
>> On Tue, 2011-12-20 at 12:58 +0530, Srivatsa S. Bhat wrote:
>>> On 12/20/2011 11:57 AM, Al Viro wrote:
>>>
>>>> On Tue, Dec 20, 2011 at 10:26:05AM +0530, Srivatsa S. Bhat wrote:
>>>>> Oh, right, that has to be handled as well...
>>>>>
>>>>> Hmmm... How about registering a CPU hotplug notifier callback during lock init
>>>>> time, and then for every cpu that gets onlined (after we took a copy of the
>>>>> cpu_online_mask to work with), we see if that cpu is different from the ones
>>>>> we have already locked, and if it is, we lock it in the callback handler and
>>>>> update the locked_cpu_mask appropriately (so that we release the locks properly
>>>>> during the unlock operation).
>>>>>
>>>>> Handling the newly introduced race between the callback handler and lock-unlock
>>>>> code must not be difficult, I believe..
>>>>>
>>>>> Any loopholes in this approach? Or is the additional complexity just not worth
>>>>> it here?
>>>>
>>>> To summarize the modified variant of that approach hashed out on IRC:
>>>>
>>>> * lglock grows three extra things: spinlock, cpu bitmap and cpu hotplug
>>>> notifier.
>>>> * foo_global_lock_online starts with grabbing that spinlock and
>>>> loops over the cpus in that bitmap.
>>>> * foo_global_unlock_online loops over the same bitmap and then drops
>>>> that spinlock
>>>> * callback of the notifier is going to do all bitmap updates. Under
>>>> that spinlock. Events that need handling definitely include the things like
>>>> "was going up but failed", since we need the bitmap to contain all online CPUs
>>>> at all time, preferably without too much junk beyond that. IOW, we need to add
>>>> it there _before_ low-level __cpu_up() calls set_cpu_online(). Which means
>>>> that we want to clean up on failed attempt to up it. Taking a CPU down is
>>>> probably less PITA; just clear bit on the final "the sucker's dead" event.
>>>> * bitmap is initialized once, at the same time we set the notifier
>>>> up. Just grab the spinlock and do
>>>> for_each_online_cpu(N)
>>>> add N to bitmap
>>>> then release the spinlock and let the callbacks handle all updates.
>>>>
>>>> I think that'll work with relatively little pain, but I'm not familiar enough
>>>> with the cpuhotplug notifiers, so I'd rather have the folks familiar with those
>>>> to supply the set of events to watch for...
>>>>
>>>
>>>
>>> We need not watch out for "up failed" events. It is enough if we handle
>>> CPU_ONLINE and CPU_DEAD events. Because, these 2 events are triggered only
>>> upon successful online or offline operation, and these notifications are
>>> more than enough for our purpose (to update our bitmaps). Also, those cpus
>>> which came online wont start running until these "success notifications"
>>> are all done, which is where we do our stuff in the callback (ie., try
>>> grabbing the spinlock..).
>>>
>>> Of course, by doing this (only looking out for CPU_ONLINE and CPU_DEAD
>>> events), our bitmap will probably be one step behind cpu_online_mask
>>> (which means, we'll still have to take the snapshot of cpu_online_mask and
>>> work with it instead of using for_each_online_cpu()).
>>> But that doesn't matter, as long as:
>>> * we don't allow the newly onlined CPU to start executing code (this
>>> is achieved by taking the spinlock in the callback)
>>
>> I think cpu notifier callback doesn't always run on the UPing cpu.
>> Actually, it rarely runs on the UPing cpu.
>> If I was wrong about the above thought, there is still a chance that lg-lock
>> operations are scheduled on the UPing cpu before calling the callback.
>>
>
>
> I wasn't actually banking on that, but you have raised a very good point.
> The scheduler uses its own set of cpu hotplug callback handlers to start
> using the newly added cpu (see the set of callbacks in kernel/sched.c)
>
> So, now we have a race between our callback and the scheduler's callbacks.
> ("Placing" our callback appropriately in a safe position using priority
> for notifiers doesn't appeal to me that much, since it looks like too much
> hackery. It should probably be our last resort).
>


Hey, actually there is a simple solution: just nip it (or rather delay it)
in the bud ;) That is, we watch out for CPU_UP_PREPARE event and lock it
up there itself using our spinlock.. that way, that cpu will not come up
until we are done executing br_write_unlock(). In fact, we can even fail
the onlining of that cpu by returning appropriate value from our callback,
but that would be too harsh.. so we can settle for delaying the cpu online
operation :)

And we do the same thing for CPU_DOWN_PREPARE event. And we can forget about
taking snapshot of cpu_online_mask, since cpu_online_mask is guaranteed to
be unmodified until we are done with br_write_unlock().

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/