Re: iio: iio-trig-hrtimer bug on suspend/resume when used with bmi160 and bmi323

From: Jonathan Cameron
Date: Sat Jan 13 2024 - 12:46:45 EST


On Wed, 10 Jan 2024 23:35:01 +0100
Denis Benato <benato.denis96@xxxxxxxxx> wrote:

> Hello,
>
> With this mail I am submitting bug report that is probably related to
> iio-trig-hrtimer but there is also the possibility for it to be
> specific to bmi160 and bmi323.
>
> The described problem have been reproduced on my handheld PC (Asus
> RC71L) and in another handheld PC with two different gyroscope
> drivers: bmi323 (backported by me on v6.7, on RC71L) and bmi160.
>
> My target hardware (RC71L that yeld to this discovery) has a bmi323
> chip that does not have any interrupt pins reaching the CPU, yet I
> need to fetch periodically data from said device, therefore I used
> iio-trig-hrtimer: created a trigger, set the device and trigger
> sampling frequencies, bound the trigger to the device and enabled
> buffer: data is being read and available over /dev/iio:device0.
>
> While in this state if I suspend my handheld I receive (from dmesg)
> the warning reported below and at resume data is not coming out of
> the iio device and the hrtimer appears to not be working. If I create
> a new trigger and bind the new trigger to said iio device and
> re-enable buffer data does come out of /dev/iio:device0 once more,
> until the next sleep.
>
> Since this is important to me I have taken the time to look at both
> drivers and iio-trig-hrtimer and I have identified three possible
> reasons:
>
> 1) iio-trig-hrtimer won't work after suspend regardless of how it is
> used (this is what I believe is the cause)
me too.

> 2) iio-trig-hrtimer is stopped by the -ESHTDOWN returned by the
> function printing "Transfer while suspended", however that stack
> trace does not include function calls related to iio-trig-hrtimer and
> this seems less plausible 3) bmi160 and bmi323 appears to be similar
> and maybe are sharing a common bug with suspend (this is also why I
> have maintainers of those drivers in the recipient list)
>
> Thanks for your time, patience and understanding,

Hi Denis,

I suspect this is the legacy of the platform I used to test the hrtimer
and similar triggers on never had working suspend and resume (we ripped
support for that board out of the kernel a while back now...)
Hence those paths were never tested by me and others may not have cared
about this particular case.

Anyhow, so I think what is going on is fairly simple.

There is no way for a driver to indicate to a trigger provided by a separate
module / hardware device that it should stop triggering data capture.
The driver in question doesn't block data capture when suspended, which
would be easy enough to add but doesn't feel like the right solution.

So my initial thought is that we should add suspend and resume callbacks to
iio_trigger_ops and call them manually from iio device drivers in their
suspend and resume callbacks. These would simply pause whatever the
trigger source was so that no attempts are made to trigger the use of
the device when it is suspended.

It gets a little messy though as triggers can be shared between
multiple devices so we'd need to reference count suspend and resume
for the trigger to make sure we only resume once all consumers of
the trigger have said they are ready to cope with triggers again.

As mentioned, the alternative would be to block the triggers at ingress
to the bmi323 and bmi160 drivers. There may be a helpful pm flag that could
be used but if not, then setting an is_suspended flag under the data->mutex
to avoid races. and reading it in the trigger_handler to decide whether
to talk to the device should work.

I'd kind of like the generic solution of actually letting the trigger
know, but not sure how much work it would turn out to be. Either way there
are a lot of drivers to fix this problem in as in theory most triggers can
be used with most drivers that support buffered data capture.
There may also be fun corners where a hardware trigger from one IIO
device A is being used by another B and the suspend timing is such that B
finishing with the trigger results in A taking an action (in the try_reenable
callback) that could result in bus traffic.
That one is going to be much more fiddly to handle than the simpler case
you have run into.

Thanks for the detailed bug report btw. To get it fixed a few
questions:
1) Are you happy to test proposed fixes?
2) Do you want to have a go at fixing it yourself? (I'd suggest trying
the fix in the bmi323 driver first rather than looking at the other
solution)
If we eventually implement the more general version, then a bmi323
additional protection against this problem would not be a problem.

Clearly I'd like the answers to be yes to both questions, but up to you!

Jonathan