Re: [PATCH] HID: logitech-hidpp: Handle timeout differently from busy

From: Benjamin Tissoires
Date: Mon Jun 05 2023 - 13:05:36 EST




On Jun 05 2023, Mark Lord wrote:
>
> On 2023-06-05 10:27 AM, Benjamin Tissoires wrote:
> >
> > On Jun 05 2023, Linux regression tracking (Thorsten Leemhuis) wrote:
> >>
> >> On 03.06.23 14:41, Jiri Kosina wrote:
> >>> On Wed, 31 May 2023, Jiri Kosina wrote:
> >>>
> >>>>> If an attempt at contacting a receiver or a device fails because the
> >>>>> receiver or device never responds, don't restart the communication, only
> >>>>> restart it if the receiver or device answers that it's busy, as originally
> >>>>> intended.
> >>>>>
> >>>>> This was the behaviour on communication timeout before commit 586e8fede795
> >>>>> ("HID: logitech-hidpp: Retry commands when device is busy").
> >>>>>
> >>>>> This fixes some overly long waits in a critical path on boot, when
> >>>>> checking whether the device is connected by getting its HID++ version.
> >>>>>
> >>>>> Signed-off-by: Bastien Nocera <hadess@xxxxxxxxxx>
> >>>>> Suggested-by: Mark Lord <mlord@xxxxxxxxx>
> >>>>> Fixes: 586e8fede795 ("HID: logitech-hidpp: Retry commands when device is busy")
> >>>>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=217412
> >>> [...]
> >>>>
> >>>> I have applied this even before getting confirmation from the reporters in
> >>>> bugzilla, as it's the right thing to do anyway.
> >>>
> >>> Unfortunately it doesn't seem to cure the reported issue (while reverting
> >>> 586e8fede79 does):
> >>
> >> BTW, remind me again: was fixing this by reverting 586e8fede79 for now a
> >> option? I guess it's not, but if I'm wrong I wonder if that might at
> >> this point be the best way forward.
> >
> > Could be. I don't think we thought at simply reverting it because it is
> > required for some new supoprted devices because they might differ
> > slightly from what we currently supported.
> >
> > That being said, Bastien will be unavailable for at least a week AFAIU,
> > so maybe we should revert that patch.
> >
> >>
> >>> https://bugzilla.kernel.org/show_bug.cgi?id=217523#c2
>
> Pardon me, but I don't understand what that "retry" loop
> is even attempting to do inside function hidpp_send_message_sync().
>
> It appears to unconditionally loop until:
> (1) the __hidpp_send_report() fails,
> or (2) it gets a HIDPP_ERROR,
> or (3) it gets a HIDPP20_ERROR other than HIDPP20_ERROR_BUSY,
> or (4) until it has looped 3 times, which appears to be the normal exit.
>
> It doesn't seem to have any provision to exit the loop earlier on "success"
> (whatever that is).
>
> And so when it finally does exit after the 3 iterations,
> it then returns the last value of "ret",
> which will be zero from the __hidpp_send_report() call,
> or sometimes the most recent non-BUSY HIDPP20_ERROR seen.
>
> Obviously I'm missing something, as otherwise this code would never have
> passed review and made it into the Linux kernel in the first place. Right?

Ouch, very much ouch :(

So that one is on me, sorry, I completely missed that and trusted the
local tests. We are human and can make mistakes. And TBH a lot of people
have been staring at this code for a while now without noticing that.

Would you mind giving a shot at the following (untested) patch:

---