Re: [PATCH] serial: sc16is7xx: address RX timeout interrupt errata

From: Daniel Mack
Date: Tue Nov 14 2023 - 11:32:41 EST


Hi Hugo,

On 11/14/23 16:20, Hugo Villeneuve wrote:
> On Tue, 14 Nov 2023 08:49:04 +0100
> Daniel Mack <daniel@xxxxxxxxxx> wrote:
>> This devices has a silicon bug that makes it report a timeout interrupt
>> but no data in FIFO.
>>
>> The datasheet states the following in the errata section 18.1.4:
>>
>> "If the host reads the receive FIFO at the at the same time as a
>> time-out interrupt condition happens, the host might read 0xCC
>> (time-out) in the Interrupt Indication Register (IIR), but bit 0
>> of the Line Status Register (LSR) is not set (means there is not
>> data in the receive FIFO)."
>>
>> When this happens, the loop in sc16is7xx_irq() will run forever,
>> which effectively blocks the i2c bus and breaks the functionality
>> of the UART.
>>
>> From the information above, it is assumed that when the bug is
>> triggered, the FIFO does in fact have payload in its buffer, but the
>> fill level reporting is off-by-one. Hence this patch fixes the issue
>> by reading one byte from the FIFO when that condition is detected.
>
> From what I understand from the errata, when the problem occurs, it
> affects bit 0 of the LSR register. I see no mention that it
> also affects the RX FIFO level register (SC16IS7XX_RXLVL_REG)?

True, the errata doesn't explicitly mention that, but tests have shown
that the RXLVL register is equally affected.

> LSR[0] would be checked only if we were using polled mode of
> operation, but we always use the interrupt mode (IRQ), and therefore I
> would say that this errata doesn't apply to this driver, and the
> patch is not necessary...

Well, it is. We have seen this bug in the wild and extensively
stress-tested the patch on dozens of boards for many days. Without this
patch, kernels on affected systems would consume a lot of CPU cycles in
the interrupt threads and effectively render the I2C bus unusable due to
the busy polling.

With this patch applied, we were no longer able to reproduce the issue.


Thanks,
Daniel