Re: [RFT][PATCH 0/3] cpufreq / PM: QoS: Introduce frequency QoS and use it in cpufreq

From: Sudeep Holla
Date: Thu Oct 17 2019 - 12:42:45 EST


On Thu, Oct 17, 2019 at 06:34:28PM +0200, Rafael J. Wysocki wrote:
> On Thu, Oct 17, 2019 at 12:00 PM Sudeep Holla <sudeep.holla@xxxxxxx> wrote:
> >
> > On Thu, Oct 17, 2019 at 03:27:25PM +0530, Viresh Kumar wrote:
> > > On 16-10-19, 15:23, Sudeep Holla wrote:
> > > > Thanks for the spinning these patches so quickly.
> > > >
> > > > I did give it a spin, but unfortunately it doesn't fix the bug I reported.
> > > > So I looked at my bug report in detail and looks like the cpufreq_driver
> > > > variable is set to NULL at that point and it fails to dereference it
> > > > while trying to execute:
> > > > ret = cpufreq_driver->verify(new_policy);
> > > > (Hint verify is at offset 0x1c/28)
> > > >
> > > > So I suspect some race as this platform with bL switcher tries to
> > > > unregister and re-register the cpufreq driver during the boot.
> > > >
> > > > I need to spend more time on this as reverting the initial PM QoS patch
> > > > to cpufreq.c makes the issue disappear.
>
> I guess you mean commit 67d874c3b2c6 ("cpufreq: Register notifiers
> with the PM QoS framework")?
>

Correct.

> That would make sense, because it added the cpufreq_notifier_min() and
> cpufreq_notifier_max() that trigger handle_update() via
> schedule_work().
>

Yes, it was not clear as I didn't trace to handle_update earlier. After
looking at depth today afternoon, I arrived at the same conclusion.

> [BTW, Viresh, it looks like cpufreq_set_policy() should still ensure
> that the new min is less than the new max, because the QoS doesn't do
> that.]
>
> > > Is this easily reproducible ? cpufreq_driver == NULL shouldn't be the case, it
> > > get updated only once while registering/unregistering cpufreq drivers. That is
> > > the last thing which can go wrong from my point of view :)
> > >
> >
> > Yes, if I boot my TC2 with bL switcher enabled, it always crashes on boot.
>
> It does look like handle_update() races with
> cpufreq_unregister_driver() and cpufreq_remove_dev (called from there
> indirectly) does look racy.

Indeed, we just crossed the mails. I just found that and posted a patch.

--
Regards,
Sudeep