Re: [RFC PATCH RT] PM: runtime: avoid retry loops on RT

From: Rafael J. Wysocki
Date: Thu Oct 21 2021 - 06:37:20 EST


On Wed, Oct 6, 2021 at 8:18 PM John Keeping <john@xxxxxxxxxxxx> wrote:
>
> On Wed, 6 Oct 2021 19:05:50 +0200
> "Rafael J. Wysocki" <rafael@xxxxxxxxxx> wrote:
>
> > On Tue, Oct 5, 2021 at 7:17 PM John Keeping <john@xxxxxxxxxxxx> wrote:
> > >
> > > On Tue, 5 Oct 2021 18:38:27 +0200
> > > "Rafael J. Wysocki" <rafael@xxxxxxxxxx> wrote:
> > >
> > > > On Tue, Oct 5, 2021 at 6:14 PM John Keeping <john@xxxxxxxxxxxx> wrote:
> > > > >
> > > > > With PREEMPT_RT spin_unlock() is identical to spin_unlock_irq() so there
> > > > > is no reason to have a special case using the former. Furthermore,
> > > > > spin_unlock() enables preemption meaning that a task in RESUMING or
> > > > > SUSPENDING state may be preempted by a higher priority task running
> > > > > pm_runtime_get_sync() leading to a livelock.
> > > > >
> > > > > Use the non-irq_safe path for all waiting so that the waiting task will
> > > > > block.
> > > > >
> > > > > Note that this changes only the waiting behaviour of irq_safe, other
> > > > > uses are left unchanged so that the parent device always remains active
> > > > > in the same way as !RT.
> > > > >
> > > > > Signed-off-by: John Keeping <john@xxxxxxxxxxxx>
> > > >
> > > > So basically, the idea is that the irq_safe flag should have no effect
> > > > when CONFIG_PREEMPT_RT is set, right?
> > > >
> > > > Wouldn't it be cleaner to make it not present at all in that case?
> > >
> > > Yes, just replacing pm_runtime_irq_safe() with an empty function would
> > > also fix it, but I'm not sure if that will have unexpected effects from
> > > the parent device suspending/resuming, especially in terms of latency
> > > for handling interrupts.
> >
> > Well, the code as is doesn't work with CONFIG_PREEMPT_RT set anyway in general.
> >
> > Also this is not just pm_runtime_irq_safe(), but every access to this
> > flag (and there's more of them than just the ones changed below).
> >
> > What about putting the flag under #ifdef CONFIG_PREEMPT_RT and
> > providing read/write accessor helpers for it that will be empty in
> > RT-enabled kernels?
>
> That's the other approach I considered, but there are really two things
> that irq_safe means here:
>
> 1) Call the suspend/resume hooks with interrupts disabled
>
> 2) Keep the parent device running and make other changes that allow (1)
> on non-RT systems (for example in amba_pm_runtime_suspend() leave the
> clock prepared when irq_safe is set, but unprepare it otherwise)
>
> In the approach of leaving the flag unset on PREEMPT_RT we solve the
> primary problem which is that (1) is irrelevant on RT, but that would
> also affect (2) and I'm not sure whether that's desirable or not.
>
> It's quite possible the answer is that the other changes don't matter
> and we should take the simpler approach of just removing irq_safe under
> CONFIG_PREEMPT_RT. I'm becoming convinced that this is the right
> answer, but I'm not confident that I fully understand the wider
> ramifications.

The initial motivation for adding irq_safe was to allow interrupt
handlers of some devices to use PM-runtime, but in RT kernels that's
possible regardless IIUC, so I don't see a reason for having irq_safe
at all in that case.