Re: [PATCH v4 11/18] PM: EM: Add runtime update interface to modify EM power

From: Lukasz Luba
Date: Mon Oct 02 2023 - 10:09:23 EST




On 9/29/23 14:18, Rafael J. Wysocki wrote:
On Fri, Sep 29, 2023 at 11:59 AM Lukasz Luba <lukasz.luba@xxxxxxx> wrote:


[snip]


It's done above, next to '!dev || !dev->em_pd'

Yes, it is, I meant something like this:

if (!cb || !cb->update_power || !dev)
return -EINVAL;

mutex_lock(&em_pd_mutex);

pd = dev->em_pd;
if (!pd) {
ret = -EINVAL; /* or perhaps -ENODATA */
goto unlock_em;
}



OK, I see what you mean. Let me change that.


+
+ runtime_table = kzalloc(sizeof(*runtime_table), GFP_KERNEL);
+ if (!runtime_table) {
+ ret = -ENOMEM;
+ goto unlock_em;
+ }
+
+ runtime_table->state = kcalloc(pd->nr_perf_states,
+ sizeof(struct em_perf_state),
+ GFP_KERNEL);
+ if (!runtime_table->state) {
+ ret = -ENOMEM;
+ goto free_runtime_table;
+ }

The above allocations can be merged into one and allocating memory
under the mutex is questionable.

So how to make sure that there is no 2 callers trying to update the
same EM or unregistration is not in the background?

You can allocate memory upfront and take the mutex before accessing
the shared data structures. If there's an error in the code running
under the mutex, release it and then free the memory.

Allocating memory is one operation, updating the shared data
structures to use it is another one. The former doesn't affect the
shared state in any way, so why do it under the mutex?

Yes, make sense. I will shrink that critical section. Good catch,
thanks!


[snip]


@@ -501,9 +598,23 @@ void em_dev_unregister_perf_domain(struct device *dev)

runtime_table = pd->runtime_table;

+ /*
+ * Safely destroy runtime modifiable EM. By using the call
+ * synchronize_rcu() we make sure we don't progress till last user
+ * finished the RCU section and our update got applied.
+ */
rcu_assign_pointer(pd->runtime_table, NULL);
synchronize_rcu();

+ /*
+ * After the sync no updates will be in-flight, so free the
+ * memory allocated for runtime table (if there was such).
+ */
+ if (runtime_table != pd->default_table) {
+ kfree(runtime_table->state);
+ kfree(runtime_table);
+ }

Can't this race with the RCU callback freeing the runtime table?

That's why there is this 'synchronize_rcu()' above and the mutex. The
updating caller if finished the update, would unlock the mutex and this
unregister code can go. Here we call the synchronize_rcu() so we assure
the callback has finished for the update path and than we explicitly
free the saved 'runtime_table' here. So all data should be freed and
code serialized in those two paths.

This doesn't quite agree with my understanding of what synchronize_rcu() does.

IIUC, RCU callbacks can run as soon as the grace period has elapsed
and they need not wait for synchronize_rcu() to return. Conversely,
synchronize_rcu() doesn't wait for all of the RCU callbacks to
complete.

Now, em_destroy_rt_table_rcu() doesn't actually use the mutex, so how
exactly is it protected against racing with this code?


I'll try to draw in on some pictures...

(previous instance )
+---------------------+
| |
| runtime table #1 |
| |
+---------------------+


(next instance )
+---------------------+
| |
| runtime table #2 |
| |
+---------------------+


(not possible new instance)
+.....................+
. .
. runtime table #3 .
. .
+.....................+



cpu A - "updater" | cpu B - "remover"
|
------------------------------|------------------------------
lock em_pd_mutex |
|
alloc runtime table #2 | lock em_pd_mutex
| (waiting)
async free instance #1 | .
| .
unlock em_pd_mutex | .
| (enter critical section)
|
lock em_pd_mutex | set NULL to runtime table ptr
(waiting) |
(wanted to create #3 inst) | synchronize rcu to make it is visible
. |
. | implicit free instance #2
. |
. | free the rest of EM and EM
. |
. | unlock em_pd_mutex
(enter critical section) |
!dev->em_pd so |
unlock & exit |
|
|


This should clean all involved memory and also prevent
of allocating new instance, when we unregister EM.