Adreno devfreq lockdep splat with 6.3-rc2

From: Johan Hovold
Date: Wed Mar 15 2023 - 05:18:23 EST


Hi Rob,

Since 6.3-rc2 (or possibly -rc1), I'm now seeing the below
devfreq-related lockdep splat.

I noticed that you posted a fix for something similar here:

https://lore.kernel.org/r/20230312204150.1353517-9-robdclark@xxxxxxxxx

but that particular patch makes no difference.

>From skimming the calltraces below and qos/devfreq related changes in
6.3-rc1 it seems like this could be related to:

fadcc3ab1302 ("drm/msm/gpu: Bypass PM QoS constraint for idle clamp")

Johan


[ 35.389822] ======================================================
[ 35.389824] WARNING: possible circular locking dependency detected
[ 35.389826] 6.3.0-rc2 #208 Not tainted
[ 35.389828] ------------------------------------------------------
[ 35.389829] ring0/348 is trying to acquire lock:
[ 35.389830] ffff43424cfa2078 (&devfreq->lock){+.+.}-{3:3}, at: qos_min_notifier_call+0x28/0x88
[ 35.389845]
but task is already holding lock:
[ 35.389846] ffff4342432b78e8 (&(c->notifiers)->rwsem){++++}-{3:3}, at: blocking_notifier_call_chain+0x34/0xa0
[ 35.389855]
which lock already depends on the new lock.

[ 35.389856]
the existing dependency chain (in reverse order) is:
[ 35.389857]
-> #4 (&(c->notifiers)->rwsem){++++}-{3:3}:
[ 35.389861] lock_acquire+0x68/0x84
[ 35.389865] down_write+0x58/0xfc
[ 35.389869] blocking_notifier_chain_register+0x30/0x8c
[ 35.389872] freq_qos_add_notifier+0x68/0x7c
[ 35.389876] dev_pm_qos_add_notifier+0xe8/0x114
[ 35.389881] devfreq_add_device.part.0+0x360/0x5a4
[ 35.389884] devm_devfreq_add_device+0x74/0xe0
[ 35.389886] msm_devfreq_init+0xa0/0x154 [msm]
[ 35.389915] msm_gpu_init+0x320/0x5b0 [msm]
[ 35.389933] adreno_gpu_init+0x164/0x2d8 [msm]
[ 35.389951] a6xx_gpu_init+0x270/0x608 [msm]
[ 35.389968] adreno_bind+0x184/0x284 [msm]
[ 35.389983] component_bind_all+0x124/0x288
[ 35.389989] msm_drm_bind+0x1d8/0x6a8 [msm]
[ 35.390004] try_to_bring_up_aggregate_device+0x1ec/0x2f4
[ 35.390007] __component_add+0xa8/0x194
[ 35.390010] component_add+0x14/0x20
[ 35.390012] dp_display_probe+0x2b4/0x474 [msm]
[ 35.390029] platform_probe+0x68/0xd8
[ 35.390031] really_probe+0x184/0x3c8
[ 35.390034] __driver_probe_device+0x7c/0x188
[ 35.390036] driver_probe_device+0x3c/0x110
[ 35.390039] __device_attach_driver+0xbc/0x158
[ 35.390041] bus_for_each_drv+0x84/0xe0
[ 35.390044] __device_attach+0xa8/0x1d4
[ 35.390046] device_initial_probe+0x14/0x20
[ 35.390049] bus_probe_device+0xac/0xb0
[ 35.390051] deferred_probe_work_func+0xa0/0xf4
[ 35.390053] process_one_work+0x288/0x6c4
[ 35.390056] worker_thread+0x74/0x450
[ 35.390058] kthread+0x118/0x11c
[ 35.390060] ret_from_fork+0x10/0x20
[ 35.390063]
-> #3 (dev_pm_qos_mtx){+.+.}-{3:3}:
[ 35.390066] lock_acquire+0x68/0x84
[ 35.390068] __mutex_lock+0x98/0x428
[ 35.390072] mutex_lock_nested+0x2c/0x38
[ 35.390074] dev_pm_qos_remove_notifier+0x34/0x140
[ 35.390077] genpd_remove_device+0x3c/0x174
[ 35.390081] genpd_dev_pm_detach+0x78/0x1b4
[ 35.390083] dev_pm_domain_detach+0x24/0x34
[ 35.390085] a6xx_gmu_remove+0x64/0xd0 [msm]
[ 35.390101] a6xx_destroy+0xa8/0x138 [msm]
[ 35.390116] adreno_unbind+0x40/0x64 [msm]
[ 35.390131] component_unbind+0x38/0x6c
[ 35.390134] component_unbind_all+0xc8/0xd4
[ 35.390136] msm_drm_uninit.isra.0+0x168/0x1dc [msm]
[ 35.390152] msm_drm_bind+0x2f4/0x6a8 [msm]
[ 35.390167] try_to_bring_up_aggregate_device+0x1ec/0x2f4
[ 35.390170] __component_add+0xa8/0x194
[ 35.390172] component_add+0x14/0x20
[ 35.390175] dp_display_probe+0x2b4/0x474 [msm]
[ 35.390190] platform_probe+0x68/0xd8
[ 35.390192] really_probe+0x184/0x3c8
[ 35.390194] __driver_probe_device+0x7c/0x188
[ 35.390197] driver_probe_device+0x3c/0x110
[ 35.390199] __device_attach_driver+0xbc/0x158
[ 35.390201] bus_for_each_drv+0x84/0xe0
[ 35.390203] __device_attach+0xa8/0x1d4
[ 35.390206] device_initial_probe+0x14/0x20
[ 35.390208] bus_probe_device+0xac/0xb0
[ 35.390210] deferred_probe_work_func+0xa0/0xf4
[ 35.390212] process_one_work+0x288/0x6c4
[ 35.390214] worker_thread+0x74/0x450
[ 35.390216] kthread+0x118/0x11c
[ 35.390217] ret_from_fork+0x10/0x20
[ 35.390219]
-> #2 (&gmu->lock){+.+.}-{3:3}:
[ 35.390222] lock_acquire+0x68/0x84
[ 35.390224] __mutex_lock+0x98/0x428
[ 35.390226] mutex_lock_nested+0x2c/0x38
[ 35.390229] a6xx_gpu_set_freq+0x30/0x5c [msm]
[ 35.390245] msm_devfreq_target+0xb4/0x218 [msm]
[ 35.390260] devfreq_set_target+0x84/0x2f4
[ 35.390262] devfreq_update_target+0xc4/0xec
[ 35.390263] devfreq_monitor+0x38/0x1f0
[ 35.390265] process_one_work+0x288/0x6c4
[ 35.390267] worker_thread+0x74/0x450
[ 35.390269] kthread+0x118/0x11c
[ 35.390270] ret_from_fork+0x10/0x20
[ 35.390272]
-> #1 (&df->lock){+.+.}-{3:3}:
[ 35.390275] lock_acquire+0x68/0x84
[ 35.390276] __mutex_lock+0x98/0x428
[ 35.390279] mutex_lock_nested+0x2c/0x38
[ 35.390281] msm_devfreq_get_dev_status+0x48/0x134 [msm]
[ 35.390296] devfreq_simple_ondemand_func+0x3c/0x144
[ 35.390298] devfreq_update_target+0x4c/0xec
[ 35.390300] devfreq_monitor+0x38/0x1f0
[ 35.390302] process_one_work+0x288/0x6c4
[ 35.390304] worker_thread+0x74/0x450
[ 35.390305] kthread+0x118/0x11c
[ 35.390307] ret_from_fork+0x10/0x20
[ 35.390308]
-> #0 (&devfreq->lock){+.+.}-{3:3}:
[ 35.390311] __lock_acquire+0x1394/0x21fc
[ 35.390313] lock_acquire.part.0+0xc4/0x1fc
[ 35.390314] lock_acquire+0x68/0x84
[ 35.390316] __mutex_lock+0x98/0x428
[ 35.390319] mutex_lock_nested+0x2c/0x38
[ 35.390321] qos_min_notifier_call+0x28/0x88
[ 35.390323] blocking_notifier_call_chain+0x6c/0xa0
[ 35.390325] pm_qos_update_target+0xdc/0x24c
[ 35.390327] freq_qos_apply+0x68/0x74
[ 35.390329] apply_constraint+0x100/0x148
[ 35.390331] __dev_pm_qos_update_request+0xb8/0x278
[ 35.390333] dev_pm_qos_update_request+0x3c/0x64
[ 35.390335] msm_devfreq_active+0xf8/0x194 [msm]
[ 35.390350] msm_gpu_submit+0x18c/0x1a8 [msm]
[ 35.390365] msm_job_run+0xbc/0x140 [msm]
[ 35.390380] drm_sched_main+0x1a0/0x528 [gpu_sched]
[ 35.390387] kthread+0x118/0x11c
[ 35.390388] ret_from_fork+0x10/0x20
[ 35.390390]
other info that might help us debug this:

[ 35.390391] Chain exists of:
&devfreq->lock --> dev_pm_qos_mtx --> &(c->notifiers)->rwsem

[ 35.390395] Possible unsafe locking scenario:

[ 35.390396] CPU0 CPU1
[ 35.390397] ---- ----
[ 35.390397] lock(&(c->notifiers)->rwsem);
[ 35.390399] lock(dev_pm_qos_mtx);
[ 35.390401] lock(&(c->notifiers)->rwsem);
[ 35.390403] lock(&devfreq->lock);
[ 35.390405]
*** DEADLOCK ***

[ 35.390406] 4 locks held by ring0/348:
[ 35.390407] #0: ffff43424cfa1170 (&gpu->lock){+.+.}-{3:3}, at: msm_job_run+0xb0/0x140 [msm]
[ 35.390426] #1: ffff43424cfa1208 (&gpu->active_lock){+.+.}-{3:3}, at: msm_gpu_submit+0xdc/0x1a8 [msm]
[ 35.390443] #2: ffffdbf2a5472718 (dev_pm_qos_mtx){+.+.}-{3:3}, at: dev_pm_qos_update_request+0x30/0x64
[ 35.390448] #3: ffff4342432b78e8 (&(c->notifiers)->rwsem){++++}-{3:3}, at: blocking_notifier_call_chain+0x34/0xa0
[ 35.390452]
stack backtrace:
[ 35.390454] CPU: 4 PID: 348 Comm: ring0 Not tainted 6.3.0-rc2 #208
[ 35.390456] Hardware name: LENOVO 21BYZ9SRUS/21BYZ9SRUS, BIOS N3HET32D (1.04 ) 03/16/2020
[ 35.390458] Call trace:
[ 35.390460] dump_backtrace+0xa4/0x128
[ 35.390463] show_stack+0x18/0x24
[ 35.390465] dump_stack_lvl+0x60/0xac
[ 35.390469] dump_stack+0x18/0x24
[ 35.390470] print_circular_bug+0x24c/0x2f8
[ 35.390472] check_noncircular+0x134/0x148
[ 35.390473] __lock_acquire+0x1394/0x21fc
[ 35.390475] lock_acquire.part.0+0xc4/0x1fc
[ 35.390477] lock_acquire+0x68/0x84
[ 35.390478] __mutex_lock+0x98/0x428
[ 35.390481] mutex_lock_nested+0x2c/0x38
[ 35.390483] qos_min_notifier_call+0x28/0x88
[ 35.390485] blocking_notifier_call_chain+0x6c/0xa0
[ 35.390487] pm_qos_update_target+0xdc/0x24c
[ 35.390489] freq_qos_apply+0x68/0x74
[ 35.390491] apply_constraint+0x100/0x148
[ 35.390493] __dev_pm_qos_update_request+0xb8/0x278
[ 35.390495] dev_pm_qos_update_request+0x3c/0x64
[ 35.390497] msm_devfreq_active+0xf8/0x194 [msm]
[ 35.390512] msm_gpu_submit+0x18c/0x1a8 [msm]
[ 35.390527] msm_job_run+0xbc/0x140 [msm]
[ 35.390542] drm_sched_main+0x1a0/0x528 [gpu_sched]
[ 35.390547] kthread+0x118/0x11c
[ 35.390548] ret_from_fork+0x10/0x20