Re: [PATCH] USB:bugfix a controller halt error

From: Oliver Neukum
Date: Thu Jul 27 2023 - 05:05:38 EST


On 27.07.23 09:00, liulongfang wrote:
On 2023/7/26 19:16, Oliver Neukum wrote:

1. temporary - that is you have detected memory corruption but the RAM cell is not broken
2. unrecoverable - that is we have lost data
3. locateable - that is you know it hit the buffer of this operation and only it

Am I correct so far?

You are right about the testing process.
But this problem can exist in the real environment, just the probability of
occurrence is very low.

Understood. Bit flips are random.

But this leaves two open questions.

1. How is the error reported

2. How are we supposed to handle it

Firstly, if we already know that there is an ECC failure
on the host we can use a specific error code and can check
for that.

Secondly, does this mean that the affected memory location
must not be touched until the machine is power cycled
or does it simply mean that the buffer is invalid?

Our test tool only simulates that external interference destroys this part
of the data in the buffer on the ECC memory. Even without this testing tool.
This problem may also occur on real business hardware devices.

Understood. But what is the correct remedy if teh problem strikes
for real?

Regards
Oliver