Re: [Patch v3 3/6] cpufreq: qcom-cpufreq-hw: Add dcvs interrupt support

From: Thara Gopinath
Date: Mon Jul 12 2021 - 21:18:31 EST




On 7/12/21 12:41 AM, Viresh Kumar wrote:
On 09-07-21, 11:37, Thara Gopinath wrote:
On 7/9/21 2:46 AM, Viresh Kumar wrote:
@@ -389,6 +503,10 @@ static int qcom_cpufreq_hw_cpu_exit(struct cpufreq_policy *policy)
dev_pm_opp_remove_all_dynamic(cpu_dev);
dev_pm_opp_of_cpumask_remove_table(policy->related_cpus);
+ if (data->lmh_dcvs_irq > 0) {
+ devm_free_irq(cpu_dev, data->lmh_dcvs_irq, data);

Why using devm variants here and while requesting the irq ?

Missed this one ?

Yep. I just replied to Bjorn's email on this. I will move to non devm version.



+ cancel_delayed_work_sync(&data->lmh_dcvs_poll_work);
+ }

Please move this to qcom_cpufreq_hw_lmh_exit() or something.

Ok.


Now with sequence of disabling interrupt, etc, I see a potential
problem.

CPU0 CPU1

qcom_cpufreq_hw_cpu_exit()
-> devm_free_irq();
qcom_lmh_dcvs_poll()
-> qcom_lmh_dcvs_notify()
-> enable_irq()

-> cancel_delayed_work_sync();


What will happen if enable_irq() gets called after freeing the irq ?
Not sure, but it looks like you will hit this then from manage.c:

WARN(!desc->irq_data.chip, KERN_ERR "enable_irq before
setup/request_irq: irq %u\n", irq))

?

You got a chicken n egg problem :)

Yes indeed! But also it is a very rare chicken and egg problem.
The scenario here is that the cpus are busy and running load causing a
thermal overrun and lmh is engaged. At the same time for this issue to be
hit the cpu is trying to exit/disable cpufreq.

Yes, it is a very specific case but it needs to be resolved anyway. You don't
want to get this ever :)

Calling
cancel_delayed_work_sync first could solve this issue, right ?
cancel_delayed_work_sync guarantees the work not to be pending even if
it requeues itself on return. So once the delayed work is cancelled, the
interrupts can be safely disabled. Thoughts ?

I don't think even that would provide such guarantees to you here, as there is
a chance the work gets queued again because of an interrupt that triggers right
after you cancel the work.

The basic way of solving such issues is that once you cancel something, you need
to guarantee that it doesn't get triggered again, no matter what.

The problem here I see is with your design itself, both delayed work and irq can
enable each other, so no matter which one you disable first, won't be
sufficient. You need to fix that design somehow.

So I really need the interrupt to fire and then the timer to kick in and take up the monitoring. I can think of introducing a variable is_disabled which is updated and read under a spinlock. qcom_cpufreq_hw_cpu_exit can hold the spinlock and set is_disabled to true prior to cancelling the work queue or disabling the interrupt. Before re-enabling the interrupt or re-queuing the work in qcom_lmh_dcvs_notify, is_disabled can be read and checked.

But does this problem not exist in target_index , fast_switch etc also ? One cpu can be disabling and the other one can be updating the target right?



--
Warm Regards
Thara (She/Her/Hers)