Re: [PATCH v2] serial: 8250_dw: Fix common clocks usage race condition

From: Andy Shevchenko
Date: Tue Mar 24 2020 - 06:12:57 EST


On Mon, Mar 23, 2020 at 04:50:17PM +0300, Sergey Semin wrote:
> On Mon, Mar 23, 2020 at 11:01:09AM +0100, Maxime Ripard wrote:
> > On Mon, Mar 23, 2020 at 05:46:09AM +0300, Sergey.Semin@xxxxxxxxxxxxxxxxxxxx wrote:
> > > From: Serge Semin <Sergey.Semin@xxxxxxxxxxxxxxxxxxxx>
> > >
> > > There are races possible in the dw8250_set_termios() callback method
> > > and while the device is in PM suspend state. A race condition may
> > > happen if the baudrate clock source device is shared with some other
> > > device (in our machine it's another DW UART port). In this case if that
> > > device changes the clock rate while serial console is using it the
> > > DW 8250 UART port might not only end up with an invalid uartclk value
> > > saved, but may also experience a distorted output data since baud-clock
> > > could have been changed. In order to fix this lets enable an exclusive
> > > reference clock rate access in case if "baudclk" device is specified.
> > >
> > > So if some other device also acquires the rate exclusivity during the
> > > time of a DW UART 8250 port being opened, then DW UART 8250 driver
> > > won't be able to alter the baud-clock. It shall just use the available
> > > clock rate. Similarly another device also won't manage to change the
> > > rate at that time. If nothing else have the exclusive rate access
> > > acquired except DW UART 8250 driver, then the driver will be able to
> > > alter the rate as much as it needs to in accordance with the currently
> > > implemented logic.

> > clk_rate_exclusive_get is pretty intrusive, and due to the usual
> > topology of clock trees, this will lock down 3-4 parent clocks to
> > their current rate as well. In the Allwinner SoCs case for example,
> > this will lock down the same PLL than the one used by the CPU,
> > preventing cpufreq from running.
>
> Speaking about weak design of a SoC' clock tree. Our problems are nothing
> with respect to the Allwinner SoC, in which case of changing the
> CPU-frequency may cause the UART glitches subsequently causing data
> transfer artefacts.) Moreover as I can see the same issue may raise for
> I2C, QSPI, PWM devices there.
>
> Anyway your concern does make sense.
>
> > However, the 8250 has a pretty wide range of dividers and can adapt to
> > any reasonable parent clock rate, so we don't really need to lock the
> > rate either, we can simply react to a parent clock rate change using
> > the clock notifiers, just like the SiFive UART is doing.
> >
> > I tried to do that, but given that I don't really have an extensive
> > knowledge of the 8250, I couldn't find a way to stop the TX of chars
> > while we change the clock rate. I'm not sure if this is a big deal or
> > not, the SiFive UART doesn't seem to care.
>
> Yes, your solution is also possible, but even in case of stopping Tx/Rx it
> doesn't lack drawbacks. First of all AFAIK there is no easy way to just
> pause the transfers. We'd have to first wait for the current transfers
> to be completed, then somehow lock the port usage (both Tx and Rx
> traffic), permit the reference clock rate change, accordingly adjust the
> UART clock divider, and finally unlock the port. While if we don't mind
> to occasionally have UART data glitches, we can just adjust the UART ref
> divider synchronously with ref clock rate change as you and SiFive UART
> driver suggest.
>
> So we are now at a zugzwang - a fork to three not that good solutions:
> 1) lock the whole clock branch and provide a glitchless interfaces. But
> by doing so we may (in case of Allwinner SoCs we will) lockup some very
> important functionality like CPU-frequency change while the UART port is
> started up. In this case we won't have the data glitches.
> 2) just adjust the UART clock divider in case of reference clock rate
> change (use the SiFive UART driver approach). In this case we may have the
> data corruption.
> 3) somehow implement the algo: wait for the transfers to be completed,
> lock UART interface (it's possible for Tx, but for Rx in case of no handshake
> enabled it's simply impossible), permit the ref clock rate change,
> adjust the UART divider, then unlock the UART interface. In this case the data
> glitches still may happen (if no modem control is available or
> handshakes are disabled).
>
> As for the cases of Baikal-T1 UARTs the first solutions is the most suitable.
> We don't lock anything valuable, since a base PLL output isn't directly
> connected to any device and it's rate once setup isn't changed during the
> system running. On the other hand I don't mind to implement the second
> solution, even though it's prone to data glitches. Regarding the solution
> 3) I won't even try. It's too complicated, I don't have time and
> test-infrastructure for this.
>
> So Andy what do you think?

>From Intel HW perspective the first two are okay, but since Maxime is against
first, you have the only option from your list. Perhaps somebody may give
option 4) here...

--
With Best Regards,
Andy Shevchenko