Re: [PATCH 0/8] cpufreq: Auto-register with energy model

From: Quentin Perret
Date: Wed Aug 11 2021 - 04:38:08 EST


On Wednesday 11 Aug 2021 at 10:48:59 (+0530), Viresh Kumar wrote:
> On 10-08-21, 13:35, Quentin Perret wrote:
> > On Tuesday 10 Aug 2021 at 13:06:47 (+0530), Viresh Kumar wrote:
> > > Provide a cpufreq driver flag so drivers can ask the cpufreq core to register
> > > with the EM core on their behalf.
> >
> > Hmm, that's not quite what this does. This asks the cpufreq core to
> > use *PM_OPP* to register an EM, which I think is kinda wrong to do from
> > there IMO. The decision to use PM_OPP or another mechanism to register
> > an EM belongs to platform specific code (drivers), so it is odd for the
> > PM_OPP registration to have its own cpufreq flag but not the other ways.
> >
> > As mentioned in another thread, the very reason to have PM_EM is to not
> > depend on PM_OPP, so I'm worried about the direction of travel with this
> > series TBH.
>
> I had to use the pm-opp version, since almost everyone was using that.
>
> On the other hand, there isn't a lot of OPP specific stuff in
> dev_pm_opp_of_register_em(). It just uses dev_pm_opp_get_opp_count(),
> that's all. This ended up in the OPP core, nothing else. Maybe we can
> now move it back to the EM core and name it differently ?

Well it also uses dev_pm_opp_find_freq_ceil() and
dev_pm_opp_get_voltage(), so not sure how easy it will be to move, but
if it is possible no objection from me.

> > > This allows us to get rid of duplicated code
> > > in the drivers and fix the unregistration part as well, which none of the
> > > drivers have done until now.
> >
> > This series adds more code than it removes,
>
> Sadly yes :(
>
> > and the unregistration is
> > not a fix as we don't ever remove the EM tables by design, so not sure
> > either of these points are valid arguments.
>
> I think that design needs to be looked over again, it looks broken to
> me everytime I land onto this code. I wonder why we don't unregister
> stuff.
>
> Lets say, I am working on the cpufreq driver and I want to test that
> on my ARM machine. Rebooting a simpler board to test stuff out is
> easy, but if I am working on an ARM server which is running lots of
> other userspace stuff as well, I won't want to reboot the machine just
> to test a different versions of the driver. I will rather want to
> build the driver as module and insert/remove it again and again.
>
> If the frequency table changes in between versions, this just breaks
> as EM won't be updated again.
>
> This breaks one of the most basic rules of Linux Kernel. Inserting a
> module should have exactly the same final behavior every single time.
> This model doesn't guarantee it. It simply looks broken.

Right but the EM is a description of the hardware, so it seemed fair
to assume this wouldn't change across the lifetime of the OS, similar
to the DT which we can't reload at run-time. Yes it can be a little odd
if you load/unload your driver module, but note that you generally can't
load two completely different drivers on a single system. You'll just
load the same one again and the hardware hasn't changed in the meantime,
so the previously loaded EM will still be correct. I hear your argument
about cpufreq driver development, but the locking involved to allow
'just' that is pretty involved, and nobody has complained about this
specific issue so far, so that didn't seem worth it. If we do have good
reasons to change the EM at runtime, then yes I think we should do it,
it just didn't seem like that was the case until now.