Re: 回复: [PATCH] Performance: devfreq: avoid devfreq delay work re-init before

From: Mukesh Ojha
Date: Wed Nov 15 2023 - 08:25:19 EST




On 11/15/2023 5:25 PM, 黄再漾(Joyyoung Huang) wrote:
Hi Mukesh
I try the below patch on my side,issue still happened ;but I am working on kernel 5.10.
https://lore.kernel.org/all/1699957648-31299-1-git-send-email-quic_mojha@xxxxxxxxxxx/

I will try to debug the patch on my side , I will update you when there is new progress.

There still seems to be problem with the patch [1], work is queued to some cpu but not executing where there can be parallel call can come for devfreq_monitor_start() and that is not being checked and same can come
during cancel_delayed_work_sync() call. We can protect the same with below patch applied on top of the patch given on [1].

[1]
https://lore.kernel.org/all/1699957648-31299-1-git-send-email-quic_mojha@xxxxxxxxxxx/

diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c
index 09b93104521b..a25c74fc31d7 100644
--- a/drivers/devfreq/devfreq.c
+++ b/drivers/devfreq/devfreq.c
@@ -488,6 +488,9 @@ void devfreq_monitor_start(struct devfreq *devfreq)
return;

mutex_lock(&devfreq->lock);
+ if (delayed_work_pending(&devfreq->work))
+ goto out;
+
switch (devfreq->profile->timer) {
case DEVFREQ_TIMER_DEFERRABLE:
INIT_DEFERRABLE_WORK(&devfreq->work, devfreq_monitor);
@@ -503,8 +506,8 @@ void devfreq_monitor_start(struct devfreq *devfreq)
queue_delayed_work(devfreq_wq, &devfreq->work,
msecs_to_jiffies(devfreq->profile->polling_ms));

- devfreq->stop_polling = false;
out:
+ devfreq->stop_polling = false;
mutex_unlock(&devfreq->lock);
}
EXPORT_SYMBOL(devfreq_monitor_start);
@@ -529,8 +532,8 @@ void devfreq_monitor_stop(struct devfreq *devfreq)
}

devfreq->stop_polling = true;
- mutex_unlock(&devfreq->lock);
cancel_delayed_work_sync(&devfreq->work);
+ mutex_unlock(&devfreq->lock);

-Mukesh


-----邮件原件-----
发件人: 黄再漾(Joyyoung Huang)
发送时间: 2023年11月14日 21:37
收件人: Mukesh Ojha <quic_mojha@xxxxxxxxxxx>; myungjoo.ham@xxxxxxxxxxx; kyungmin.park@xxxxxxxxxxx; cw00.choi@xxxxxxxxxxx
抄送: linux-pm@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; huangzaiyang <joyyoung.wang@xxxxxxxxx>
主题: 回复: [PATCH] Performance: devfreq: avoid devfreq delay work re-init before


On 11/10/2023 3:04 PM, huangzaiyang wrote:
From: huangzaiyang <joyyoung.wang@xxxxxxxxx>

There is a timer_list race condition when executing the following test shell script:
'''
while true
do
echo "simple_ondemand" > /sys/class/devfreq/1d84000.ufshc/governor
echo "performance" >
/sys/class/devfreq/1d84000.ufshc/governor
done
'''

[13511.214366][ C3] Unable to handle kernel paging request at virtual address dead00000000012a
[13511.214393][ C3] Mem abort info:
[13511.214398][ C3] ESR = 0x96000044
[13511.214404][ C3] EC = 0x25: DABT (current EL), IL = 32 bits
[13511.214409][ C3] SET = 0, FnV = 0
[13511.214414][ C3] EA = 0, S1PTW = 0
[13511.214417][ C3] Data abort info:
[13511.214422][ C3] ISV = 0, ISS = 0x00000044
[13511.214427][ C3] CM = 0, WnR = 1
[13511.214432][ C3] [dead00000000012a] address between user and kernel address ranges
[13511.214439][ C3] Internal error: Oops: 96000044 [#1] PREEMPT SMP
[13511.215449][ C3] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G S W O 5.10.168-android12-9-o-g63cc297a7b34 #1
[13511.215454][ C3] Hardware name: Qualcomm Technologies, Inc. Cape MTP, Whiteswan (DT)
[13511.215460][ C3] pstate: 82400085 (Nzcv daIf +PAN -UAO +TCO BTYPE=--)
[13511.215472][ C3] pc : expire_timers+0x9c/0x428
[13511.215478][ C3] lr : __run_timers+0x1f0/0x328
[13511.215483][ C3] sp : ffffffc00801bdd0
[13511.215487][ C3] x29: ffffffc00801bdf0 x28: ffffffdb87b31698
[13511.215493][ C3] x27: ffffffdb87999e58 x26: ffffffdb87966008
[13511.215499][ C3] x25: 0000000000000001 x24: ffffff8001734a00
[13511.215506][ C3] x23: 00000000000000e0 x22: dead000000000122
[13511.215512][ C3] x21: 000000010032658e x20: ffffff89f7a9ae80
[13511.215518][ C3] x19: ffffffc00801be50 x18: ffffffc00801d038
[13511.215525][ C3] x17: 0000000000000240 x16: 0000000000000201
[13511.215532][ C3] x15: ffffffffffffffff x14: ffffff89f7a9aef8
[13511.215538][ C3] x13: 0000000000000240 x12: ffffff89f7a9aea8
[13511.215544][ C3] x11: 0000000000000021 x10: 000000014032658e
[13511.215550][ C3] x9 : ffffffc00801be50 x8 : dead000000000122
[13511.215556][ C3] x7 : ffff71646c68735e x6 : ffffff89f7aaae58
[13511.215563][ C3] x5 : 0000000000000000 x4 : 0000000000000101
[13511.215569][ C3] x3 : ffffff89f7a9aef0 x2 : ffffff89f7a9aef0
[13511.215575][ C3] x1 : ffffffc00801be50 x0 : ffffff8045804428
[13511.215581][ C3] Call trace:
[13511.215586][ C3] expire_timers+0x9c/0x428
[13511.215591][ C3] __run_timers+0x1f0/0x328
[13511.215596][ C3] run_timer_softirq+0x28/0x58
[13511.215602][ C3] efi_header_end+0x168/0x5ec
[13511.215610][ C3] __irq_exit_rcu+0x108/0x124
[13511.215617][ C3] __handle_domain_irq+0x118/0x1e4
[13511.215625][ C3] gic_handle_irq.31230+0x6c/0x250
[13511.215630][ C3] el1_irq+0xe4/0x1c0
[13511.215638][ C3] cpuidle_enter_state+0x3a4/0xa04
[13511.215644][ C3] do_idle+0x308/0x574
[13511.215649][ C3] cpu_startup_entry+0x84/0x90
[13511.215656][ C3] secondary_start_kernel+0x204/0x274
[13511.215664][ C3] Code: d503201f a9402408 f9000128 b4000048 (f9000509)
[13511.215670][ C3] ---[ end trace 5100bad72a35d566 ]---
[13511.215676][ C3] Kernel panic - not syncing: Oops: Fatal exception in interrupt

This is because when switching the governor through the sys node, the
devfreq_monitor_start function will re-initialize the delayed work
task, which will cause the delay work pending flag to become invalid,
and the timer pending judgment contained in the delayed work will also become invalid, and then the pending interception will be executed when the queue is executed.

So we remove the delay work'initialization work to the
devfreq_add_device and timer_store functions, and the delay work pending judgment is performed before the devfreq_monitor_start function performs the queue operation.

Signed-off-by: ZaiYang Huang <huangzaiyang@xxxxxxxx>
---
drivers/devfreq/devfreq.c | 36 ++++++++++++++++++++++++------------
1 file changed, 24 insertions(+), 12 deletions(-)

diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c
index b3a68d5833bd..8ae6f853a21e 100644
--- a/drivers/devfreq/devfreq.c
+++ b/drivers/devfreq/devfreq.c
@@ -483,18 +483,7 @@ void devfreq_monitor_start(struct devfreq *devfreq)
if (IS_SUPPORTED_FLAG(devfreq->governor->flags, IRQ_DRIVEN))
return;

- switch (devfreq->profile->timer) {
- case DEVFREQ_TIMER_DEFERRABLE:
- INIT_DEFERRABLE_WORK(&devfreq->work, devfreq_monitor);
- break;
- case DEVFREQ_TIMER_DELAYED:
- INIT_DELAYED_WORK(&devfreq->work, devfreq_monitor);
- break;
- default:
- return;
- }
-
- if (devfreq->profile->polling_ms)
+ if (devfreq->profile->polling_ms &&
+ !delayed_work_pending(&devfreq->work))
queue_delayed_work(devfreq_wq, &devfreq->work,
msecs_to_jiffies(devfreq->profile->polling_ms));
}
@@ -830,6 +819,17 @@ struct devfreq *devfreq_add_device(struct device *dev,
goto err_dev;
}

+ switch (devfreq->profile->timer) {
+ case DEVFREQ_TIMER_DEFERRABLE:
+ INIT_DEFERRABLE_WORK(&devfreq->work, devfreq_monitor);
+ break;
+ case DEVFREQ_TIMER_DELAYED:
+ INIT_DELAYED_WORK(&devfreq->work, devfreq_monitor);
+ break;
+ default:
+ dev_err(dev, "%s: Target devfreq(%s)'s profile timer has no settings \n", devfreq->governor_name,
+ __func__);
+ }

[..]

if (!devfreq->profile->max_state || !devfreq->profile->freq_table) {
mutex_unlock(&devfreq->lock);
err = set_freq_table(devfreq); @@ -1860,6 +1860,18 @@
static ssize_t timer_store(struct device *dev, struct device_attribute *attr,
df->profile->timer = timer;
mutex_unlock(&df->lock);

+ switch (df->profile->timer) {
+ case DEVFREQ_TIMER_DEFERRABLE:
+ INIT_DEFERRABLE_WORK(&df->work, devfreq_monitor);
+ break;
+ case DEVFREQ_TIMER_DELAYED:
+ INIT_DELAYED_WORK(&df->work, devfreq_monitor);
+ break;
+ default:
+ dev_err(dev, "%s: Target devfreq(%s)'s profile timer has no settings \n", df->governor_name,
+ __func__);
+ }
+
Here, this can cause issue right, as it is modifying the delayed work data even before stopping the current running instances..

Should the above thing be done after DEVFREQ_GOV_STOP ?
But again it will boil down to the same thing as it is currently now .
ret = df->governor->event_handler(df, DEVFREQ_GOV_STOP, NULL);
if (ret) {
dev_warn(dev, "%s: Governor %s not stopped(%d)\n",
-->agree, seems better to put these codes after after DEVFREQ_GOV_STOP, as follow?
@@ -1856,10 +1856,6 @@ static ssize_t timer_store(struct device *dev, struct device_attribute *attr,
goto out;
}
- mutex_lock(&df->lock);
- df->profile->timer = timer;
- mutex_unlock(&df->lock);
-
ret = df->governor->event_handler(df, DEVFREQ_GOV_STOP, NULL);
if (ret) {
dev_warn(dev, "%s: Governor %s not stopped(%d)\n", @@ -1867,6 +1863,21 @@ static ssize_t timer_store(struct device *dev, struct device_attribute *attr,
goto out;
}
+ mutex_lock(&df->lock);
+ df->profile->timer = timer;
+ switch (df->profile->timer) {
+ case DEVFREQ_TIMER_DEFERRABLE:
+ INIT_DEFERRABLE_WORK(&df->work, devfreq_monitor);
+ break;
+ case DEVFREQ_TIMER_DELAYED:
+ INIT_DELAYED_WORK(&df->work, devfreq_monitor);
+ break;
+ default:
+ dev_err(dev, "%s: Target devfreq(%s)'s profile timer has no settings \n", df->governor_name,
+ __func__);
+ mutex_unlock(&df->lock);
+ goto out;
+ }
+ mutex_unlock(&df->lock);
+
ret = df->governor->event_handler(df, DEVFREQ_GOV_START, NULL);
if (ret)
dev_warn(dev, "%s: Governor %s not started(%d)\n",