Re: [PATCH] serial8250 on tegra hsuart: recover from spurious interrupts due to tegra2 silicon bug

From: Richard Leitner
Date: Mon Nov 21 2022 - 05:35:34 EST


Hi,

On Fri, Jul 13, 2018 at 11:32:42AM +0000, David R. Piegdon wrote:
> Hi,
> a while back I sent a few mails regarding spurious interrupts in the
> UARTA (hsuart) block of the Tegra2 SoC, when using the 8250 driver for
> it instead of the hsuart driver. After going down a pretty deep
> debugging/testing hole, I think I found a patch that fixes the issue. So
> far testing in a reboot-cycle suggests that the error frequency dropped
> from >3% of all reboots to at least <0.05% of all reboots. Tests
> continue to run over the weekend.
>
> The patch below already is a second iteration; the first did not reset
> the MCR or contain the lines below '// clear interrupts'. This resulted
> in no more spurious interrupts, but in a few % of spurious interrupts
> that were recovered the UART block did not receive any characters any
> more. So further resetting was required to fully reacquire operational
> state of the UART block.
>
> I'd love any comments/suggestions on this!

I'd like to follow up on this ancient patch as we are using it
successfully for a few years with different kernel versions on a
tegra20 SOM (tamonten) now and I'm currently cleaning up our tree.

David, have you done any work in regarding this issue since 2018?

What would be needed to get this solution mainline?

The recipient of this mail are from the initial thread [1] and
a current get_maintainers.pl run.

regards;rl

[1] https://patchwork.ozlabs.org/project/linux-tegra/patch/4676ea34-69ce-5422-1ded-94218b89f7d9@xxxxxxxx/

>
> Cheers,
>
> David
>
> diff --git a/drivers/tty/serial/8250/8250_core.c b/drivers/tty/serial/8250/8250_core.c
> index e8819aa20415..1d76eebefd4e 100644
> --- a/drivers/tty/serial/8250/8250_core.c
> +++ b/drivers/tty/serial/8250/8250_core.c
> @@ -140,6 +140,38 @@ static irqreturn_t serial8250_interrupt(int irq, void *dev_id)
> "serial8250: too much work for irq%d\n", irq);
> break;
> }
> +
> +#ifdef CONFIG_ARCH_TEGRA_2x_SOC
> + if (!handled && (port->type == PORT_TEGRA)) {
> + /*
> + * Fix Tegra 2 CPU silicon bug where sometimes
> + * "TX holding register empty" interrupts result in a
> + * bad (metastable?) state in Tegras HSUART IP core.
> + * Only way to recover seems to be to reset all
> + * interrupts as well as the TX queue and the MCR.
> + * But we don't want to loose any outgoing characters,
> + * so only do it if the RX and TX queues are empty.
> + */
> + unsigned char lsr = port->serial_in(port, UART_LSR);
> + const unsigned char fifo_empty_mask =
> + (UART_LSR_TEMT | UART_LSR_THRE);
> + if (((lsr & (UART_LSR_DR | fifo_empty_mask)) ==
> + fifo_empty_mask)) {
> + port->serial_out(port, UART_IER, 0);
> + port->serial_out(port, UART_MCR, 0);
> + serial8250_clear_and_reinit_fifos(up);
> + port->serial_out(port, UART_MCR, up->mcr);
> + port->serial_out(port, UART_IER, up->ier);
> + // clear interrupts
> + serial_port_in(port, UART_LSR);
> + serial_port_in(port, UART_RX);
> + serial_port_in(port, UART_IIR);
> + serial_port_in(port, UART_MSR);
> + up->lsr_saved_flags = 0;
> + up->msr_saved_flags = 0;
> + }
> + }
> +#endif
> } while (l != end);
>
> spin_unlock(&i->lock);