Re: 5.17-rc regression: rmi4 clients cannot deal with asynchronous suspend? (was: X1 Carbon touchpad not resumed)

From: Rajat Jain
Date: Mon Feb 07 2022 - 16:09:44 EST


+Rafael (for any inputs on asynchronous suspend / resume)
+Dmitry Torokhov (since no other maintainer of rmi4 in MAINTAINERS file)
+loic.poulain@xxxxxxxxxx (who fixed RMI device hierarchy recently)
+ Some Synaptics folks (from recent commits - Vincent Huang, Andrew
Duggan, Cheiny)

On Mon, Feb 7, 2022 at 12:23 PM Wolfram Sang <wsa@xxxxxxxxxx> wrote:
>
> Hello Hugh,
>
> > Bisection led to 172d931910e1db800f4e71e8ed92281b6f8c6ee2
> > ("i2c: enable async suspend/resume on i2c client devices")
> > and reverting that fixes it for me.
>
> Thank you for the report plus bisection and sorry for the regression!

+1, Thanks for the bisection, and apologies for the inconveniences.

The problem here seems to be that for some reason, some devices (all
connected to rmi4 adapter) failed to resume, but only when
asynchronous suspend is enabled (by 172d931910e1):

[ 79.221064] rmi4_smbus 6-002c: failed to get SMBus version number!
[ 79.265074] rmi4_physical rmi4-00: rmi_driver_reset_handler: Failed
to read current IRQ mask.
[ 79.308330] rmi4_f01 rmi4-00.fn01: Failed to restore normal operation: -6.
[ 79.308335] rmi4_f01 rmi4-00.fn01: Resume failed with code -6.
[ 79.308339] rmi4_physical rmi4-00: Failed to suspend functions: -6
[ 79.308342] rmi4_smbus 6-002c: Failed to resume device: -6
[ 79.351967] rmi4_physical rmi4-00: Failed to read irqs, code=-6

A resume failure that only shows up during asynchronous resume,
typically means that the device is dependent on some other device to
resume first, but this dependency is NOT established in a parent child
relationship (which is wrong and needs to be fixed, perhaps using
device_add_link()). Thus the kernel may be resuming these devices
without first resuming some other device that these devices need to
depend on.

TBH, I'm not sure how to fix this. The only hint I see is that all of
these devices seem to be attached to rmi4 device so perhaps something
there? I see 6e4860410b828f recently fixed device hierarchy for rmi4,
and so seemingly should have fixed this very issue (as also seen in
commit message)?

>
> I will wait a few days if people come up with a fix. If not, I will
> revert the offending commit.

While I'll be sad because this means no i2c-client can now resume in
parallel and increases resume latency by a *LOT* (hundreds of ms on
all Linux systems), I understand that this needs to be done unless
someone comes up with a fix.

I wanted to confirm that the following patches shall continue to stay?

d320ec7acc83 i2c: enable async suspend/resume for i2c adapters
7c5b3c158b38 i2c: designware: Enable async suspend / resume of
designware devices

Thanks & Best Regards,

Rajat


>
> All the best,
>
> Wolfram
>