Re: [PATCH v4 10/18] PM: EM: Add RCU mechanism which safely cleans the old data

From: Rafael J. Wysocki
Date: Fri Sep 29 2023 - 09:00:10 EST


On Fri, Sep 29, 2023 at 11:36 AM Lukasz Luba <lukasz.luba@xxxxxxx> wrote:
>
>
>
> On 9/26/23 20:26, Rafael J. Wysocki wrote:
> > On Mon, Sep 25, 2023 at 10:11 AM Lukasz Luba <lukasz.luba@xxxxxxx> wrote:
> >>
> >> The EM is going to support runtime modifications of the power data.
> >> Introduce RCU safe mechanism to clean up the old allocated EM data.
> >
> > "RCU-based" probably and "to clean up the old EM data safely".
>
> Yes, thanks
>
> >
> >> It also adds a mutex for the EM structure to serialize the modifiers.
> >
> > This part doesn't match the code changes in the patch.
>
> Good catch. It left from some older version. We use the existing
> em_pd_mutex.
>
> >
> >> Signed-off-by: Lukasz Luba <lukasz.luba@xxxxxxx>
> >> ---
> >> kernel/power/energy_model.c | 29 +++++++++++++++++++++++++++++
> >> 1 file changed, 29 insertions(+)
> >>
> >> diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
> >> index 5b40db38b745..2345837bfd2c 100644
> >> --- a/kernel/power/energy_model.c
> >> +++ b/kernel/power/energy_model.c
> >> @@ -23,6 +23,9 @@
> >> */
> >> static DEFINE_MUTEX(em_pd_mutex);
> >>
> >> +static void em_cpufreq_update_efficiencies(struct device *dev,
> >> + struct em_perf_state *table);
> >> +
> >> static bool _is_cpu_device(struct device *dev)
> >> {
> >> return (dev->bus == &cpu_subsys);
> >> @@ -104,6 +107,32 @@ static void em_debug_create_pd(struct device *dev) {}
> >> static void em_debug_remove_pd(struct device *dev) {}
> >> #endif
> >>
> >> +static void em_destroy_rt_table_rcu(struct rcu_head *rp)
> >
> > Adding static functions without callers will obviously cause the
> > compiler to complain, which is one of the reasons to avoid doing that.
> > The other is that it is hard to say how these functions are going to
> > be used without reviewing multiple patches simultaneously, which is a
> > pain as far as I'm concerned.
>
> It is used in this patch, but inside the call_rcu() as 2nd arg.

I missed that, sorry for the noise.

> I have marked that below. The compiler didn't complain IIRC.
>
> >
> >> +{
> >> + struct em_perf_table *runtime_table;
> >> +
> >> + runtime_table = container_of(rp, struct em_perf_table, rcu);
> >> + kfree(runtime_table->state);
> >> + kfree(runtime_table);
> >
> > If runtime_table and its state were allocated in one go, it would be
> > possible to free them in one go either.
> >
> > For some reason, you don't seem to want to do that, but why?
>
> We had a few internal reviews and there were voices where saying that
> it's better to have 2 identical tables: 'default_table' and
> 'runtime_table' to make sure it's visible everywhere when it's used.
> That made the need to actually have also the 'state' table inside.
> I don't see it as a big problem, though.

What I'm trying to say is that you can allocate runtime_table along
with the table pointed to by its state field in one invocation of
kzalloc() (say).

Having just one memory region to free eventually instead of two of
them would help to avoid some complexity, especially in the next
patch.

> >
> >> +}
> >> +
> >> +static void em_perf_runtime_table_set(struct device *dev,
> >> + struct em_perf_table *runtime_table)
> >> +{
> >> + struct em_perf_domain *pd = dev->em_pd;
> >> + struct em_perf_table *tmp;
> >> +
> >> + tmp = pd->runtime_table;
> >> +
> >> + rcu_assign_pointer(pd->runtime_table, runtime_table);
> >> +
> >> + em_cpufreq_update_efficiencies(dev, runtime_table->state);
> >> +
> >> + /* Don't free default table since it's used by other frameworks. */
> >
> > Apparently, some frameworks are only going to use the default table
> > while the runtime-updatable table will be used somewhere else at the
> > same time.
> >
> > I'm not really sure if this is a good idea.
>
> Runtime table is only for driving the task placement in the EAS.
>
> The thermal gov IPA won't make better decisions because it already
> has the mechanism to accumulate the error that it made.
>
> The same applies to DTPM, which works in a more 'configurable' way,
> rather that hard optimization mechanism (like EAS).

My understanding of the above is that the other EM users don't really
care that much so they can get away with using the default table all
the time, but EAS needs more accuracy, so the table used by it needs
to be adjusted in certain situations.

Fair enough, I'm assuming that you've done some research around it.
Still, this is rather confusing.

> >
> >> + if (tmp != pd->default_table)
> >> + call_rcu(&tmp->rcu, em_destroy_rt_table_rcu);
>
> The em_destroy_rt_table_rcu() is used here ^^^^^^