Re: [PATCH] intel_rapl: Correct hotplug correction

From: Peter Zijlstra
Date: Thu May 22 2014 - 08:33:08 EST


On Thu, May 22, 2014 at 05:24:33PM +0530, Srivatsa S. Bhat wrote:
> Yeah, its complicated and perhaps we can do much better than that. But I'll
> try to explain why there are so many different locks in the existing code.
>
> get/put_online_cpus() uses cpu_hotplug.lock underneath. Although that's just
> a mutex, its not used in the usual way (more about that below).
>
> cpu_maps_update_begin(), cpu_notifier_register_begin/done,
> register/unregister_cpu_notifier -- all of these use the cpu_add_remove_lock.
>
> The fundamental difference between these 2 categories is the concurrency
> allowed with the lock :-
> get_online_cpus() is like a read_lock(). That is, it allows any number
> of concurrent CPU hotplug readers (tasks that want to block hotplug), but it
> ensures that a writer won't run concurrently with any reader.
>
> Hence, this won't work for notifier registrations because register/unregister
> has to _mutate_ the notifier linked-list, and obviously we can't have multiple
> tasks trying to do this at the same time. So we need a proper mutex that
> allows only 1 task at a time into the critical section. That's why the
> cpu_add_remove_lock is used in all the register/unregister paths.
>
> Now, let's look at why the CPU hotplug writer path (the task that actually
> does hotplug) takes cpu_add_remove_lock in addition to the cpu_hotplug.lock.
> First reason is simple: you don't want tasks that are doing notifier
> [un]registrations to race with hotplug. But the second, although subtle reason
> is that put_online_cpus() and cpu_hotplug_writer_begin()/done() require that
> there is only 1 CPU hotplug writer at a time. The cpu_add_remove_lock provides
> this guarantee, since it is taken in the writer path. (The long comment above
> cpu_hotplug_begin() mentions this as well).
>
> And then there is powerpc which uses cpu_maps_update_begin/done() to protect
> dlpar operations (operations that change the cpu_present_mask etc). But ideally
> it should have just used cpu_hotplug_begin() itself, like what driver/acpi/
> processor_driver.c does, but then it would have to hold cpu_add_remove_lock
> as well, due to the current design :(
>
> That was just me trying to explain the current mess, not justifying it! :-/

So I think we can reduce it to just the one rwsem (with recursion) if we
shoot CPU_POST_DEAD in the head.

Because currently we cannot take the rwsem in exclusive mode over the
whole thing because of POST_DEAD.

Once we kill that, the hotplug lock's exclusive mode can cover the
entire hotplug operation.

For (un)registrer we can also use the exclusive lock, (un)register of
notifiers should not happen often and should equally not be performance
critical, so using the exclusive lock should be just fine.

That means we can then remove cpu_add_remove_lock from both the register
and hotplug ops proper. (un)register_cpu_notifier() should get an
assertion that we hold the hotplug lock in exclusive mode.

That leaves the non-exclusive lock to guard against hotplug happening.

Now, last time Linus said he would like that to be a non-lock, and have
it weakly serialized, RCU style. Not sure we can fully pull that off,
haven't throught that through yet.

> I think Oleg had a proposed patch to use per-cpu rwsem in CPU hotplug to
> drastically simplify this whole locking scheme. I think we could look at
> that again.

I don't think that was to simplify things, the hotplug lock is basically
an open coded rw lock already, so that was to make it reuse the per-cpu
rwsem code.

Attachment: pgpJNdE2mfgUp.pgp
Description: PGP signature