Re: [BUG] dev_pm_opp refcount issue on Arm Juno r0

From: Viresh Kumar
Date: Thu Jan 03 2019 - 02:05:56 EST


On 20-12-18, 15:27, Valentin Schneider wrote:
> Hi,
>
> While running some hotplug torture test [1] on my Juno r0 I came across
> the follow splat:
>
> [ 716.561862] ------------[ cut here ]------------
> [ 716.566451] refcount_t: underflow; use-after-free.
> [ 716.571240] WARNING: CPU: 2 PID: 18 at lib/refcount.c:280 refcount_dec_not_one+0x9c/0xc0
> [ 716.579246] Modules linked in:
> [ 716.582269] CPU: 2 PID: 18 Comm: cpuhp/2 Not tainted 4.20.0-rc7 #39
> [ 716.588469] Hardware name: ARM Juno development board (r0) (DT)
> [ 716.594326] pstate: 40000005 (nZcv daif -PAN -UAO)
> [ 716.599065] pc : refcount_dec_not_one+0x9c/0xc0
> [ 716.603546] lr : refcount_dec_not_one+0x9c/0xc0
> [ 716.608024] sp : ffff00000a063c70
> [ 716.611299] x29: ffff00000a063c70 x28: 0000000000000000
> [ 716.616555] x27: 0000000000000000 x26: 0000000000000002
> [ 716.621810] x25: ffff000009169000 x24: ffff000008f8e1b0
> [ 716.627065] x23: ffff000008ce0920 x22: 00000000ffffffff
> [ 716.632319] x21: ffff000009169000 x20: ffff8009762a2664
> [ 716.637574] x19: ffff000009294a90 x18: 0000000000000400
> [ 716.642828] x17: 0000000000000000 x16: 0000000000000000
> [ 716.648082] x15: 0000000000000000 x14: 0000000000000400
> [ 716.653336] x13: 000000000000023f x12: 0000000000043705
> [ 716.658590] x11: 0000000000000108 x10: 0000000000000960
> [ 716.663844] x9 : ffff00000a063970 x8 : ffff800976943ec0
> [ 716.669098] x7 : 0000000000000000 x6 : ffff80097ff720b8
> [ 716.674353] x5 : ffff80097ff720b8 x4 : 0000000000000000
> [ 716.679607] x3 : ffff80097ff78e68 x2 : ffff80097ff720b8
> [ 716.684861] x1 : 6374e2a7925c1100 x0 : 0000000000000000
> [ 716.690115] Call trace:
> [ 716.692532] refcount_dec_not_one+0x9c/0xc0
> [ 716.696669] refcount_dec_and_mutex_lock+0x18/0x70
> [ 716.701409] _put_opp_list_kref+0x28/0x50
> [ 716.705373] _dev_pm_opp_find_and_remove_table+0x24/0x88
> [ 716.710628] _dev_pm_opp_cpumask_remove_table+0x50/0xa0
> [ 716.715796] dev_pm_opp_cpumask_remove_table+0x10/0x18
> [ 716.720879] scpi_cpufreq_exit+0x40/0x50
> [ 716.724758] cpufreq_offline+0x108/0x1e0
> [ 716.728637] cpuhp_cpufreq_offline+0xc/0x18

This probably happened due to some of recent OPP core changes and I missed
updating this platform (I updated mvebu though). The problem is completely
different from what you logs show :)

Please try the below patch.

@Sudeep: Please help review it as well.

--
viresh

-------------------------8<-------------------------