Re: [PATCH] USB:bugfix a controller halt error

From: liulongfang
Date: Wed Jul 26 2023 - 02:59:05 EST


On 2023/7/21 22:57, Alan Stern Wrote:
> On Fri, Jul 21, 2023 at 06:00:15PM +0800, liulongfang wrote:
>> On systems that use ECC memory. The ECC error of the memory will
>> cause the USB controller to halt. It causes the usb_control_msg()
>> operation to fail.
>
> How often does this happen in real life? (Besides, it seems to me that
> if your system is getting a bunch of ECC memory errors then you've got
> much worse problems than a simple USB failure!)
>

This problem is on ECC memory platform.
In the test scenario, the problem is 100% reproducible.

> And why do you worry about ECC memory failures in particular? Can't
> _any_ kind of failure cause the usb_control_msg() operation to fail?
>
>> At this point, the returned buffer data is an abnormal value, and
>> continuing to use it will lead to incorrect results.
>
> The driver already contains code to check for abnormal values. The
> check is not perfect, but it should prevent things from going too
> badly wrong.
>

If it is ECC memory error. These parameter checks would also
actually be invalid.

>> Therefore, it is necessary to judge the return value and exit.
>>
>> Signed-off-by: liulongfang <liulongfang@xxxxxxxxxx>
>
> There is a flaw in your reasoning.
>
> The operation carried out here is deliberately unsafe (for full-speed
> devices). It is made before we know the actual maxpacket size for ep0,
> and as a result it might return an error code even when it works okay.
> This shouldn't happen, but a lot of USB hardware is unreliable.
>
> Therefore we must not ignore the result merely because r < 0. If we do
> that, the kernel might stop working with some devices.
>
It may be that the handling solution for ECC errors is different from that
of the OS platform. On the test platform, after usb_control_msg() fails,
reading the memory data of buf will directly lead to kernel crash:

[ T14] Call trace:
[ T14] hub_port_init+0x280/0x9f0
[ T14] hub_port_connect+0x1d4/0xa40
[ T14] hub_port_connect_change+0xb8/0x2b0
[ T14] port_event+0x430/0x5d0
[ T14] hub_event+0x138/0x4a0
[ T14] process_one_work+0x1c8/0x39c
[ T14] worker_thread+0x150/0x3d0
[ T14] kthread+0xfc/0x130
[ T14] ret_from_fork+0x10/0x18
[ T14] Code: 528000c2 b9007fea 94002c9a b9407fea (39401f41)

thanks,
Longfang.
> Alan Stern
>
>> ---
>> drivers/usb/core/hub.c | 10 ++++++++++
>> 1 file changed, 10 insertions(+)
>>
>> diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
>> index a739403a9e45..6a43198be263 100644
>> --- a/drivers/usb/core/hub.c
>> +++ b/drivers/usb/core/hub.c
>> @@ -4891,6 +4891,16 @@ hub_port_init(struct usb_hub *hub, struct usb_device *udev, int port1,
>> USB_DT_DEVICE << 8, 0,
>> buf, GET_DESCRIPTOR_BUFSIZE,
>> initial_descriptor_timeout);
>> + /* On systems that use ECC memory, ECC errors can
>> + * cause the USB controller to halt.
>> + * It causes this operation to fail. At this time,
>> + * the buf data is an abnormal value and needs to be exited.
>> + */
>> + if (r < 0) {
>> + kfree(buf);
>> + goto fail;
>> + }
>> +
>> switch (buf->bMaxPacketSize0) {
>> case 8: case 16: case 32: case 64: case 255:
>> if (buf->bDescriptorType ==
>> --
>> 2.24.0
>>
>
> .
>