Re: [PATCH] iio: buffer: Silence lock nesting splat

From: Lars-Peter Clausen
Date: Sat Aug 20 2022 - 07:08:43 EST


On 8/20/22 13:06, Jonathan Cameron wrote:
On Tue, 16 Aug 2022 10:08:28 +0200
Vincent Whitchurch <vincent.whitchurch@xxxxxxxx> wrote:

If an IIO driver uses callbacks from another IIO driver and calls
iio_channel_start_all_cb() from one of its buffer setup ops, then
lockdep complains due to the lock nesting, as in the below example with
lmp91000. Since the locks are being taken on different IIO devices,
there is no actual deadlock, so add lock nesting annotation to silence
the spurious warning.

============================================
WARNING: possible recursive locking detected
6.0.0-rc1+ #10 Not tainted
--------------------------------------------
python3/23 is trying to acquire lock:
0000000064c944c0 (&indio_dev->mlock){+.+.}-{3:3}, at: iio_update_buffers+0x62/0x180

but task is already holding lock:
00000000636b64c0 (&indio_dev->mlock){+.+.}-{3:3}, at: enable_store+0x4d/0x100

other info that might help us debug this:
Possible unsafe locking scenario:

CPU0
----
lock(&indio_dev->mlock);
lock(&indio_dev->mlock);

*** DEADLOCK ***

May be due to missing lock nesting notation

5 locks held by python3/23:
#0: 00000000636b5420 (sb_writers#5){.+.+}-{0:0}, at: ksys_write+0x67/0x100
#1: 0000000064c19280 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x13a/0x270
#2: 0000000064c3d9e0 (kn->active#14){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x149/0x270
#3: 00000000636b64c0 (&indio_dev->mlock){+.+.}-{3:3}, at: enable_store+0x4d/0x100
#4: 0000000064c945c8 (&iio_dev_opaque->info_exist_lock){+.+.}-{3:3}, at: iio_update_buffers+0x4f/0x180

stack backtrace:
CPU: 0 PID: 23 Comm: python3 Not tainted 6.0.0-rc1+ #10
Call Trace:
dump_stack+0x1a/0x1c
__lock_acquire.cold+0x407/0x42d
lock_acquire+0x1ed/0x310
__mutex_lock+0x72/0xde0
mutex_lock_nested+0x1d/0x20
iio_update_buffers+0x62/0x180
iio_channel_start_all_cb+0x1c/0x20 [industrialio_buffer_cb]
lmp91000_buffer_postenable+0x1b/0x20 [lmp91000]
__iio_update_buffers+0x50b/0xd80
enable_store+0x81/0x100
dev_attr_store+0xf/0x20
sysfs_kf_write+0x4c/0x70
kernfs_fop_write_iter+0x179/0x270
new_sync_write+0x99/0x120
vfs_write+0x2c1/0x470
ksys_write+0x67/0x100
sys_write+0x10/0x20

Signed-off-by: Vincent Whitchurch <vincent.whitchurch@xxxxxxxx>
I'm wondering if this is sufficient.
At first glance there are a whole bunch of other possible cases of this.
Any consumer driver that calls iio_device_claim_direct_mode() would be a
problem - though I'm not sure any do?

I'm not sure I properly understand lockdep notations, but I thought the
point was we needed to define a hierarchy? To do that here we need
an IIO driver that is a consumer to somehow let the IIO core know that
and mark all calls to the locks appropriately. This gets trickier
as we allow 3+ levels of IIO drivers calling into each other.

We should also think about how to prevent recursion if there are 3
IIO drivers involved.

There are two different approaches for this kind of nested locking. One is to use mutex_lock_nested(). This works if there is a strict hierarchy. The I2C framework for example has a function to determine the position of a I2C mux in the hierarchy and uses that for locking. See https://elixir.bootlin.com/linux/latest/source/drivers/i2c/i2c-core-base.c#L1151.

I'm not sure this directly translates to IIO since the consumers/producers don't have to be a in strict hierarchy.  And if it is a complex graph it can be difficult to figure out the right level for mutex_lock_nested().

The other method is to mark each mutex as its own class. lockdep does the lock checking based on the lock class and by default the same mutex of different instances is considered the same class to keep the resource requirements for the checker lower.

Regmap for example does this. See https://elixir.bootlin.com/linux/latest/source/drivers/base/regmap/regmap.c#L795.

This could be a solution for IIO with the downside how the additional work for the checker. But as long as there are only a few IIO devices per system that should be OK. We could also only set the per device lock class if in kernel consumers are enabled.