Re: [RFC PATCH] serial: core: Use pm_runtime_get_sync() in uart_start()

From: Xuewen Yan
Date: Sun Nov 26 2023 - 20:54:20 EST


Hi Tony

On Sat, Nov 25, 2023 at 3:48 PM Tony Lindgren <tony@xxxxxxxxxxx> wrote:
>
> * Xuewen Yan <xuewen.yan@xxxxxxxxxx> [231124 12:25]:
> > The commit 84a9582fd203("serial: core: Start managing serial controllers to enable runtime PM")
> > use the pm_runtime_get() after uart_port_lock() which would close the irq and disable preement.
> > At this time, pm_runtime_get may cause the following two problems:
> >
> > (1) deadlock in try_to_wake_up:
> >
> > uart_write()
> > uart_port_lock() <<< get lock
> > __uart_start
> > __pm_runtime_resume
> > rpm_resume
> > queue_work_on
> > try_to_wake_up
> > _printk
> > uart_console_write
> > ...
> > uart_port_lock() <<< wait forever
> >
> > (2) scheduling while atomic:
> > uart_write()
> > uart_port_lock() <<< get lock
> > __uart_start
> > __pm_runtime_resume
> > rpm_resume
> > chedule() << sleep
>
> Are these spinlock vs raw spinlock nesting warnings from lockdep? If so
> can you please post the full warnings somewhere? Or if some extra steps
> are needed to reproduce please describe that too.

Indeed, we use pr_info in scheduler in our own kernel tree, and this
deadlock happended.
I would try to use printk_deferred and re-test.
And I also notice the warning was reported by syzbot:

https://lore.kernel.org/all/0000000000006f01f00608a16cea@xxxxxxxxxx/
https://lore.kernel.org/all/000000000000e7765006072e9591@xxxxxxxxxx/

These are also because the pm_runtime_put().

Thanks!

>
> Chances are very high that your serial port is already runtime active at
> this point unless you manually idle it so that's why I'm wondering as
> all that likely is happening here is a check on the runtime PM usage count.
>
> > So let us use pm_runtime_get_sync() to prevent these.
>
> We need to fix this some other way as we can't use pm_runtime_get_sync()
> here. The sync call variants require setting pm_runtime_irq_safe() for the
> device. And this is what we really want to avoid as it takes a permanent
> usage count on the parent device.
>
> What we want to do here is to get runtime PM to wake-up the device if idle
> and try tx again after runtime PM resume as needed.
>
> Just guessing at this point.. To me it sounds like the fix might be to
> use a raw spinlock for uart_port_lock() and uart_port_unlock()?
>
> Regards,
>
> Tony
>
>
> > Fixes: 84a9582fd203 ("serial: core: Start managing serial controllers to enable runtime PM")
> > Signed-off-by: Xuewen Yan <xuewen.yan@xxxxxxxxxx>
> > ---
> > drivers/tty/serial/serial_core.c | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/tty/serial/serial_core.c b/drivers/tty/serial/serial_core.c
> > index f1348a509552..902f7ed35f4d 100644
> > --- a/drivers/tty/serial/serial_core.c
> > +++ b/drivers/tty/serial/serial_core.c
> > @@ -145,7 +145,7 @@ static void __uart_start(struct uart_state *state)
> > port_dev = port->port_dev;
> >
> > /* Increment the runtime PM usage count for the active check below */
> > - err = pm_runtime_get(&port_dev->dev);
> > + err = pm_runtime_get_sync(&port_dev->dev);
> > if (err < 0 && err != -EINPROGRESS) {
> > pm_runtime_put_noidle(&port_dev->dev);
> > return;
> > @@ -159,7 +159,7 @@ static void __uart_start(struct uart_state *state)
> > if (!pm_runtime_enabled(port->dev) || pm_runtime_active(port->dev))
> > port->ops->start_tx(port);
> > pm_runtime_mark_last_busy(&port_dev->dev);
> > - pm_runtime_put_autosuspend(&port_dev->dev);
> > + pm_runtime_put_sync_autosuspend(&port_dev->dev);
> > }
> >
> > static void uart_start(struct tty_struct *tty)
> > --
> > 2.25.1
> >