Re: [PATCH 2/3] thermal/drivers/tegra: Remove get_trend function

From: Guenter Roeck
Date: Wed Jun 29 2022 - 08:56:22 EST


On 6/29/22 02:35, Dmitry Osipenko wrote:
On 6/28/22 21:43, Guenter Roeck wrote:
On Tue, Jun 28, 2022 at 08:10:30AM -0700, Guenter Roeck wrote:
On Tue, Jun 28, 2022 at 02:44:31PM +0300, Dmitry Osipenko wrote:
On 6/28/22 11:41, Daniel Lezcano wrote:

Thierry, Dmitry,

are fine with this patch?

Seems should be good. I couldn't test it using recent the linux-next
because of a lockup in LM90 driver. There were quite a lot of changes in
LM90 recently, adding Guenter.


Weird, I tested those changes to death with real hardware, and I don't
see a code path where the mutex can be left in blocked state unless the
underlying i2c driver locks up for some reason. What is the platform,
and can you point me to the devicetree file ? Also, is there anything
else lm90 or i2c related in the kernel log ?


Follow-up question: I see that various Tegra systems use lm90 compatible
chips, and the interrupt line is in general wired up. Can you check if
you get lots of interrupts on that interrupt line ? Also, can you check
what happens if you read hwmon attributes directly ?

The number of interrupt fires is okay. It's a Nexus 7 Tegra30 tablet
device that I'm using for the testing.

Today I enabled the lockdep and it immediately showed where the problem is:


Yes, that is an obvious problem. Thanks a lot for testing!

Guenter

======================================================
WARNING: possible circular locking dependency detected
5.19.0-rc4-next-20220628-00002-g94e5dbbe1c58-dirty #24 Not tainted
------------------------------------------------------
irq/91-lm90/130 is trying to acquire lock:
c27ba380 (&tz->lock){+.+.}-{3:3}, at: thermal_zone_device_update+0x2c/0x64

but task is already holding lock:
c27b42c8 (&data->update_lock){+.+.}-{3:3}, at: lm90_irq_thread+0x2c/0x68

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #1 (&data->update_lock){+.+.}-{3:3}:
__mutex_lock+0x9c/0x984
mutex_lock_nested+0x2c/0x34
lm90_read+0x44/0x3e8
hwmon_thermal_get_temp+0x58/0x8c
of_thermal_get_temp+0x38/0x44
thermal_zone_get_temp+0x5c/0x7c
thermal_zone_device_update.part.0+0x48/0x5fc
thermal_zone_device_set_mode+0xa0/0xe4
thermal_zone_device_enable+0x1c/0x20
thermal_zone_of_sensor_register+0x18c/0x19c
devm_thermal_zone_of_sensor_register+0x68/0xa4
__hwmon_device_register+0x704/0x91c
hwmon_device_register_with_info+0x6c/0x80
devm_hwmon_device_register_with_info+0x78/0xb4
lm90_probe+0x618/0x8c0
i2c_device_probe+0x170/0x2e0
really_probe+0xd8/0x300
__driver_probe_device+0x94/0xf4
driver_probe_device+0x40/0x118
__device_attach_driver+0xc8/0x10c
bus_for_each_drv+0x90/0xdc
__device_attach+0xbc/0x1d4
device_initial_probe+0x1c/0x20
bus_probe_device+0x98/0xa0
deferred_probe_work_func+0x8c/0xbc
process_one_work+0x2b8/0x774
worker_thread+0x17c/0x56c
kthread+0x108/0x13c
ret_from_fork+0x14/0x28
0x0

-> #0 (&tz->lock){+.+.}-{3:3}:
__lock_acquire+0x173c/0x3198
lock_acquire+0x128/0x3f0
__mutex_lock+0x9c/0x984
mutex_lock_nested+0x2c/0x34
thermal_zone_device_update+0x2c/0x64
hwmon_notify_event+0x128/0x138
lm90_update_alarms_locked+0x35c/0x3b8
lm90_irq_thread+0x38/0x68
irq_thread_fn+0x2c/0x8c
irq_thread+0x190/0x29c > kthread+0x108/0x13c
ret_from_fork+0x14/0x28
0x0

other info that might help us debug this:

Possible unsafe locking scenario:

CPU0 CPU1
---- ----
lock(&data->update_lock);
lock(&tz->lock);
lock(&data->update_lock);
lock(&tz->lock);

*** DEADLOCK ***

1 lock held by irq/91-lm90/130:
#0: c27b42c8 (&data->update_lock){+.+.}-{3:3}, at:
lm90_irq_thread+0x2c/0x68

stack backtrace:
CPU: 1 PID: 130 Comm: irq/91-lm90 Not tainted
5.19.0-rc4-next-20220628-00002-g94e5dbbe1c58-dirty #24
Hardware name: NVIDIA Tegra SoC (Flattened Device Tree)
Backtrace:
dump_backtrace from show_stack+0x20/0x24
r7:c33d1b60 r6:00000080 r5:60000093 r4:c168c6a4
show_stack from dump_stack_lvl+0x68/0x98
dump_stack_lvl from dump_stack+0x18/0x1c
r7:c33d1b60 r6:c20328cc r5:c203c700 r4:c20328cc
dump_stack from print_circular_bug+0x2ec/0x33c
print_circular_bug from check_noncircular+0x104/0x168
r10:c1a14cc8 r9:c33d1240 r8:00000001 r7:00000000 r6:dfc3dcc0 r5:c33d1b60
r4:c33d1b80
check_noncircular from __lock_acquire+0x173c/0x3198
r7:c33d1b80 r6:c202bc98 r5:c33d1b60 r4:c21d92ac
__lock_acquire from lock_acquire+0x128/0x3f0
r10:60000013 r9:00000000 r8:00000000 r7:00000000 r6:dfc3dd40 r5:c19ac688
r4:c19ac688
lock_acquire from __mutex_lock+0x9c/0x984
r10:c27ba380 r9:00000000 r8:c21d92ac r7:c33d1240 r6:00000000 r5:00000000
r4:c27ba348
__mutex_lock from mutex_lock_nested+0x2c/0x34
r10:c27b4000 r9:00000000 r8:dfc3de87 r7:00000000 r6:c27ba348 r5:00000000
r4:c27ba000
mutex_lock_nested from thermal_zone_device_update+0x2c/0x64
thermal_zone_device_update from hwmon_notify_event+0x128/0x138
r7:00000000 r6:00000000 r5:c2d23ea4 r4:c33fd040
hwmon_notify_event from lm90_update_alarms_locked+0x35c/0x3b8
r8:c27b4378 r7:c2d23c08 r6:00000020 r5:00000000 r4:c27b4240
lm90_update_alarms_locked from lm90_irq_thread+0x38/0x68
r9:c01c2814 r8:00000001 r7:c33d2240 r6:c27b4290 r5:c27b4240 r4:c33fc200
lm90_irq_thread from irq_thread_fn+0x2c/0x8c
r7:c33d2240 r6:c27b4000 r5:c33d1240 r4:c33fc200
irq_thread_fn from irq_thread+0x190/0x29c
r7:c33d2240 r6:c33fc224 r5:c33d1240 r4:00000000
irq_thread from kthread+0x108/0x13c
r10:00000000 r9:df9ddbf4 r8:c31d2200 r7:c33fc200 r6:c01c2710 r5:c33d1240
r4:c33fc240
kthread from ret_from_fork+0x14/0x28