Re: [PATCH] HID: logitech-hidpp: Handle timeout differently from busy

From: Mark Lord
Date: Mon Jun 05 2023 - 13:47:36 EST


On 2023-06-05 01:04 PM, Benjamin Tissoires wrote:
>
>
> On Jun 05 2023, Mark Lord wrote:
>>
>> On 2023-06-05 10:27 AM, Benjamin Tissoires wrote:
>>>
>>> On Jun 05 2023, Linux regression tracking (Thorsten Leemhuis) wrote:
>>>>
>>>> On 03.06.23 14:41, Jiri Kosina wrote:
>>>>> On Wed, 31 May 2023, Jiri Kosina wrote:
>>>>>
>>>>>>> If an attempt at contacting a receiver or a device fails because the
>>>>>>> receiver or device never responds, don't restart the communication, only
>>>>>>> restart it if the receiver or device answers that it's busy, as originally
>>>>>>> intended.
>>>>>>>
>>>>>>> This was the behaviour on communication timeout before commit 586e8fede795
>>>>>>> ("HID: logitech-hidpp: Retry commands when device is busy").
>>>>>>>
>>>>>>> This fixes some overly long waits in a critical path on boot, when
>>>>>>> checking whether the device is connected by getting its HID++ version.
>>>>>>>
>>>>>>> Signed-off-by: Bastien Nocera <hadess@xxxxxxxxxx>
>>>>>>> Suggested-by: Mark Lord <mlord@xxxxxxxxx>
>>>>>>> Fixes: 586e8fede795 ("HID: logitech-hidpp: Retry commands when device is busy")
>>>>>>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=217412
>>>>> [...]
>>>>>>
>>>>>> I have applied this even before getting confirmation from the reporters in
>>>>>> bugzilla, as it's the right thing to do anyway.
>>>>>
>>>>> Unfortunately it doesn't seem to cure the reported issue (while reverting
>>>>> 586e8fede79 does):
>>>>
>>>> BTW, remind me again: was fixing this by reverting 586e8fede79 for now a
>>>> option? I guess it's not, but if I'm wrong I wonder if that might at
>>>> this point be the best way forward.
>>>
>>> Could be. I don't think we thought at simply reverting it because it is
>>> required for some new supoprted devices because they might differ
>>> slightly from what we currently supported.
>>>
>>> That being said, Bastien will be unavailable for at least a week AFAIU,
>>> so maybe we should revert that patch.
>>>
>>>>
>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=217523#c2
>>
>> Pardon me, but I don't understand what that "retry" loop
>> is even attempting to do inside function hidpp_send_message_sync().
>>
>> It appears to unconditionally loop until:
>> (1) the __hidpp_send_report() fails,
>> or (2) it gets a HIDPP_ERROR,
>> or (3) it gets a HIDPP20_ERROR other than HIDPP20_ERROR_BUSY,
>> or (4) until it has looped 3 times, which appears to be the normal exit.
>>
>> It doesn't seem to have any provision to exit the loop earlier on "success"
>> (whatever that is).
>>
>> And so when it finally does exit after the 3 iterations,
>> it then returns the last value of "ret",
>> which will be zero from the __hidpp_send_report() call,
>> or sometimes the most recent non-BUSY HIDPP20_ERROR seen.
>>
>> Obviously I'm missing something, as otherwise this code would never have
>> passed review and made it into the Linux kernel in the first place. Right?
>
> Ouch, very much ouch :(
>
> So that one is on me, sorry, I completely missed that and trusted the
> local tests. We are human and can make mistakes. And TBH a lot of people
> have been staring at this code for a while now without noticing that.
>
> Would you mind giving a shot at the following (untested) patch:
>
> ---
>>From 7d03a6b765571d7c518c85e7e74f6f9d498fa354 Mon Sep 17 00:00:00 2001
> From: Benjamin Tissoires <benjamin.tissoires@xxxxxxxxxx>
> Date: Mon, 5 Jun 2023 18:56:36 +0200
> Subject: [PATCH] HID: hidpp: terminate retry loop on success
>
> It seems we forgot the normal case to terminate the retry loop,
> making us asking 3 times each command, which is probably a little bit
> too much.
>
> And remove the ugly "goto exit" that can be replaced by a simpler "break"
>
> Fixes: 586e8fede795 ("HID: logitech-hidpp: Retry commands when device is busy")
> Suggested-by: Mark Lord <mlord@xxxxxxxxx>
> Signed-off-by: Benjamin Tissoires <benjamin.tissoires@xxxxxxxxxx>
> ---
> drivers/hid/hid-logitech-hidpp.c | 13 ++++++-------
> 1 file changed, 6 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/hid/hid-logitech-hidpp.c b/drivers/hid/hid-logitech-hidpp.c
> index 2246044b1639..5e1a412fd28f 100644
> --- a/drivers/hid/hid-logitech-hidpp.c
> +++ b/drivers/hid/hid-logitech-hidpp.c
> @@ -286,7 +286,7 @@ static int hidpp_send_message_sync(struct hidpp_device *hidpp,
> struct hidpp_report *message,
> struct hidpp_report *response)
> {
> - int ret;
> + int ret = -1;
> int max_retries = 3;
>
> mutex_lock(&hidpp->send_mutex);
> @@ -300,13 +300,13 @@ static int hidpp_send_message_sync(struct hidpp_device *hidpp,
> */
> *response = *message;
>
> - for (; max_retries != 0; max_retries--) {
> + for (; max_retries != 0 && ret; max_retries--) {
> ret = __hidpp_send_report(hidpp->hid_dev, message);
>
> if (ret) {
> dbg_hid("__hidpp_send_report returned err: %d\n", ret);
> memset(response, 0, sizeof(struct hidpp_report));
> - goto exit;
> + break;
> }
>
> if (!wait_event_timeout(hidpp->wait, hidpp->answer_available,
> @@ -314,14 +314,14 @@ static int hidpp_send_message_sync(struct hidpp_device *hidpp,
> dbg_hid("%s:timeout waiting for response\n", __func__);
> memset(response, 0, sizeof(struct hidpp_report));
> ret = -ETIMEDOUT;
> - goto exit;
> + break;
> }
>
> if (response->report_id == REPORT_ID_HIDPP_SHORT &&
> response->rap.sub_id == HIDPP_ERROR) {
> ret = response->rap.params[1];
> dbg_hid("%s:got hidpp error %02X\n", __func__, ret);
> - goto exit;
> + break;
> }
>
> if ((response->report_id == REPORT_ID_HIDPP_LONG ||
> @@ -330,13 +330,12 @@ static int hidpp_send_message_sync(struct hidpp_device *hidpp,
> ret = response->fap.params[1];
> if (ret != HIDPP20_ERROR_BUSY) {
> dbg_hid("%s:got hidpp 2.0 error %02X\n", __func__, ret);
> - goto exit;
> + break;
> }
> dbg_hid("%s:got busy hidpp 2.0 error %02X, retrying\n", __func__, ret);
> }
> }
>
> -exit:
> mutex_unlock(&hidpp->send_mutex);
> return ret;
>
>

Tested-by: Mark Lord <mlord@xxxxxxxxx>

That works fine for me on top of 6.3.6,
and I don't even see the ETIMEDOUT happening there either (added a printk() for it).

I am unable to test on 6.4.0-rc5, because that kernel doesn't work with my USB3 docking station.

Cheers
--
Mark Lord