Re: timer list corruption in devfreq

From: Mukesh Ojha
Date: Tue Nov 14 2023 - 07:31:32 EST




On 11/10/2023 12:38 AM, Tejun Heo wrote:
Hello,

On Wed, Nov 08, 2023 at 09:39:57PM +0530, Mukesh Ojha wrote:
We are facing an issue on 6.1 kernel while using devfreq framework
and looks like the devfreq_monitor_stop()/devfreq_monitor_start is
vulnerable if frequent governor change is being done from user space
in a loop.

echo simple_ondemand > /sys/class/devfreq/1d84000.ufshc/governor
echo performance > /sys/class/devfreq/1d84000.ufshc/governor

Here, we are using ufs device, but could be any device.

Issue is because same instance of timer is being queued from two
places one from devfreq_monitor() and one from devfreq_monitor_start() as
cancel_delayed_work_sync() from devfreq_monitor_stop() was not
able to delete the delayed work time completely due to which
devfreq_monitor() work rearmed the same timer.

But there looks to be issue in the timer framework where
it was initially discussed in [1] and later fixed in [2]
but not sure being whether is it issue in cancel_delayed_work_sync()
where del_timer() inside try_to_grab_pending() need to be replaced
with timer_delete[_sync]() or devfreq_monitor_stop() need to use
this api's and then delete the work.

So, having shutdown can be more convenient in some cases and that'd be a
useful addition to workqueue both for immediate and delayed work items. That
said, that's usually not essential in fixing these issues - e.g. Can't you
just synchronize devfreq_monitor_start() and stop()?

Thanks for the feedback..

This issue can be fixed with synchronizing devfreq_monitor_[start/stop()].

Posted here,
https://lore.kernel.org/all/1699957648-31299-1-git-send-email-quic_mojha@xxxxxxxxxxx/

However, It forces the client to have a check in delayed work callback
to not queue the new delayed work timer. It is also possible if
del_timer in below sequence[1] return 'false' but do not want
another instance of the timer to be queued after a call to
cancel_delayed_work_sync() which is what can be achieved with
timer_shutdown() version of __cancel_work_timer or may be a
separate __cancel_work_timer_shutdown() introduction.

[1]
__cancel_work_timer=>try_to_grab_pending=>del_timer()

Let me know if anything wrong with my understanding.

-Mukesh




Thanks.