Re: [RFC][PATCH 1/3] PM / sleep: Flags to speed up suspend-resume of runtime-suspended devices

From: Alan Stern
Date: Mon May 05 2014 - 11:46:54 EST

Next message: Oleg Nesterov: "Re: [PATCH v2] uprobes: fix scratch register selection for rip-relative fixups"
Previous message: Sasha Levin: "Re: tracing: NULL ptr deref in ring_buffer_wait"
In reply to: Rafael J. Wysocki: "Re: [RFC][PATCH 1/3] PM / sleep: Flags to speed up suspend-resume of runtime-suspended devices"
Next in thread: Rafael J. Wysocki: "Re: [RFC][PATCH 1/3] PM / sleep: Flags to speed up suspend-resume of runtime-suspended devices"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

I'll trim a lot of material and respond to the points that are
important comments or criticisms.

On Mon, 5 May 2014, Rafael J. Wysocki wrote:

> > The description below is much better than the earlier one, but I still feel
> > this deserves to be split in two: one patch for each new flag.
>
> Well, I guess I can introduce power.leave_runtime_suspended for leaf devices
> first, but that would be somewhat artificial, because in that case some code
> added by the first patch would be removed by the second one. :-)

The "leaf devices" thing is a key point; see below.

> There is a theoretical case where the child is runtime-suspended, but
> not actually zero-power, and it doesn't have leave_runtime_suspended set,
> but its driver doesn't implement ->suspend() at all and instead it waits
> until ->suspend_late() or even ->suspend_noirq() and then attempts to do
> something extra to the device. Then, if the parent is a bridge and is
> required to be functional for accessing the child, we can't leave it
> runtime-suspended too.
>
> I'm not sure how realistic that is, to be honest, but it does look like
> a valid thing to do to my eyes, so in my opinion we may need to get the
> child driver's permission to leave the parent in runtime suspend for
> that reason too.

The only time this would be a problem is if the driver changes the
device's settings from the runtime-suspend values to the system-suspend
values, without doing a runtime resume first. For example, the driver
might disable wakeup while the device remains at low power.

I'm not sure how realistic this is, either -- although it's not hard to
imagine a PCI driver doing this sort of thing. Still, if you think we
should worry about it then I agree, the parent_needed flag ought to be
present from the start.

> I guess it is fair to simply say that "we need to get the child driver's
> permission to leave the parent in runtime suspend".

Okay. But does it follow that we need permission from the child's
descendants as well? I don't see any reason why. After all, if a
grandchild needs the child to be at full power, then the parent will
automatically end up at full power too. Which means neither
leave_runtime_suspended nor parent_needed has to be propagated up the
tree.

Hmmm, I just thought of something else. What about non-parent-child
relationships? Device B might depend on device A, even though A isn't
an ancestor of B. I guess in this case, A's leave_runtime_suspended
flag should not be set.

> > As a corollary, if we don't have the child's permission to leave the
> > parent suspended during system resume then we have to invoke all of the
> > parent's resume callbacks, which means we also have to invoke all the
> > suspend callbacks. However, we still might be able to leave the parent
> > in runtime suspend during the suspend stages. The decision whether or
> > not to do so should be up to the subsystem or driver, not the PM core;
> > the subsystem's callback routines can check the device's runtime status
> > and then do what they want.
>
> Yes, but they can do that anyway, can't they? :-)

Yes, they can. The point of this part of the patch is adding the
leave_runtime_suspended flag (1) makes the subsystem's decision a
little easier and (2) informs the subsystem when it can safely avoid
perfoming a runtime resume in its ->suspend() callback.

> > You are using leave_runtime_suspended to mean two different things:
> > remain runtime-suspended during the system suspend stages (i.e., no
> > reprogramming is needed so don't go to full power), and remain
> > runtime-suspended during both the system suspend and system resume
> > stages. Only the first meaning matters if all you want to accomplish
> > is to avoid unnecessary runtime resumes during system suspend.
>
> Well, this is not the case, becase you can't call ->resume_noirq() *after*
> ->runtime_suspend() for a number of drivers, as they simply may not expect
> that to happen (that covers all of the PCI drivers and the ACPI PM domain at
> least).

For some non-PCI, non-ACPI PM domain drivers, it _is_ okay to call
->resume_noirq() after ->runtime_suspend().

But forget about that; let's concentrate on PCI. When a PCI driver
sets leave_runtime_suspended, it is telling the PCI core that it
doesn't mind having its ->resume_noirq() callback invoked after
->runtime_suspend(). If a PCI driver doesn't set
leave_runtime_suspended, the PCI core will continue to handle it
exactly the same as now: do a runtime resume before invoking the
driver's ->suspend() callback.

> So you can't say "well, I'll skip your ->suspend_late and ->suspend_noirq,
> but then I'll resume you traditionally" for those drivers, but this isn't
> about remaining runtime-suspended during system resume too, but about
> preserving the expected ordering of callbacks for them.

For drivers that don't set leave_runtime_suspended, the ordering of
callbacks will be unchanged. That's why you introduced this flag
originally, right? So that the subsystem would know which drivers
don't mind doing things differently.

> So yes, the goal is to "remain runtime-suspended during the system suspend stages",
> but that *leads* *to* "do not execute system resume callbacks up to and including
> ->resume()" either at least for an important subset of drivers.

I disagree, for the reasons given above.

Now, this raises a second issue: How should we handle devices that can
remain runtime-suspended through both the system suspend and system
resume stages? Maybe we better discuss that in a separate email
thread.

> > > Note: Drivers (or bus types etc.) can reasonably expect that the
> > > next PM callback executed after ->runtime_suspend() will be
> > > ->runtime_resume() rather than ->resume_noirq() or ->resume_early().
> > > This change is designed with that expectation in mind.
> >
> > Except, of course, that in the current kernel this isn't true.
>
> Well, what about PCI devices? Their drivers surely can have such an expectation,
> because all of the PCI devices *are* resumed today in either pci_pm_prepare() or
> pci_pm_suspend(), before executing the driver's ->suspend() callback.

Yes, of course. But the USB subsystem, for example, doesn't expect it.

> > And there probably are a few cases where it can't ever be true.
>
> It is easy to say things like that without giving any examples.

Sorry, what I meant was that _if_ a device is going to remain in
runtime suspend throughout the system suspend and system resume stages,
then there probably are cases where the device will have to come back
to full power during resume_early -- which means ->runtime_resume()
can't be used, because runtime PM is disabled during resume_early.
Isn't this true of PCI bridges?

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Oleg Nesterov: "Re: [PATCH v2] uprobes: fix scratch register selection for rip-relative fixups"
Previous message: Sasha Levin: "Re: tracing: NULL ptr deref in ring_buffer_wait"
In reply to: Rafael J. Wysocki: "Re: [RFC][PATCH 1/3] PM / sleep: Flags to speed up suspend-resume of runtime-suspended devices"
Next in thread: Rafael J. Wysocki: "Re: [RFC][PATCH 1/3] PM / sleep: Flags to speed up suspend-resume of runtime-suspended devices"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]