Re: [PATCH] USB:bugfix a controller halt error

From: Oliver Neukum
Date: Thu Jul 27 2023 - 11:31:52 EST

Next message: Johannes Weiner: "Re: [PATCH] mm: page_alloc: consume available CMA space first"
Previous message: Logan Gunthorpe: "Re: [PATCH] dmaengine: plx_dma: Fix potential deadlock on &plxdev->ring_lock"
In reply to: Alan Stern: "Re: [PATCH] USB:bugfix a controller halt error"
Next in thread: Alan Stern: "Re: [PATCH] USB:bugfix a controller halt error"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 27.07.23 16:42, Alan Stern wrote:

On Thu, Jul 27, 2023 at 03:03:57PM +0800, liulongfang wrote:

On 2023/7/26 22:20, Alan Stern wrote:

It seems to me that something along these lines must be necessary in
any case. Unless the bad memory is cleared somehow, it would never be
usable again. The kernel might deallocate it, then reallocate for
another purpose, and then crash when the new user tries to access it.

In fact, this scenario could still happen even with your patch, which
means the patch doesn't really fix the problem.

I suppose in theory you could have something like a bad blocks list
just for RAM, but that would really hurt. You'd have to do something
about every DMA operation in every driver in theory.

Error handling would basically be an intentional memory leak.

This patch is only used to prevent data in the buffer from being accessed.
As long as the data is not accessed, the kernel does not crash.

I still don't understand. You haven't provided nearly enough
information. You should start by answering the questions that Oliver
asked. Then answer this question:

The code you are concerned about is this:

r = usb_control_msg(udev, usb_rcvaddr0pipe(),
USB_REQ_GET_DESCRIPTOR, USB_DIR_IN,
USB_DT_DEVICE << 8, 0,
buf, GET_DESCRIPTOR_BUFSIZE,
initial_descriptor_timeout);
switch (buf->bMaxPacketSize0) {

You're worried that if an ECC memory error occurs during the
usb_control_msg transfer, the kernel will crash when the "switch"
statement tries to read the value of buf->bMaxPacketSize0. That's a
reasonable thing to worry about.

Albeit unlikely. If the hardware and implementation are reasonable
you'd return a specific error code from the HCD and clean up the
RAM in your ecc driver.

The fix for USB would then conceptually be something like

retryio:
r = usb_control_msg()
if (r == -EMEMORYCORRUPTION)
goto retryio;

Regards
Oliver

Next message: Johannes Weiner: "Re: [PATCH] mm: page_alloc: consume available CMA space first"
Previous message: Logan Gunthorpe: "Re: [PATCH] dmaengine: plx_dma: Fix potential deadlock on &plxdev->ring_lock"
In reply to: Alan Stern: "Re: [PATCH] USB:bugfix a controller halt error"
Next in thread: Alan Stern: "Re: [PATCH] USB:bugfix a controller halt error"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]