Re: [RFC PATCH v2 1/2] PM: domains: Skip disabling unused domains if provider has sync_state

From: Saravana Kannan
Date: Mon Feb 06 2023 - 16:35:45 EST


On Mon, Feb 6, 2023 at 1:10 PM Doug Anderson <dianders@xxxxxxxxxxxx> wrote:
>
> Hi,
>
> On Mon, Feb 6, 2023 at 11:33 AM Saravana Kannan <saravanak@xxxxxxxxxx> wrote:
> >
> > On Mon, Feb 6, 2023 at 9:48 AM Abel Vesa <abel.vesa@xxxxxxxxxx> wrote:
> > >
> > >
> > > CC'ed Saravana
> >
> > Thanks. Please do cc me for stuff like this from the start. I skimmed
> > the series and I think it's doing one of my TODO items. So, thanks for
> > the patch!
> >
> > I'll take a closer look within a few days -- trying to get through
> > some existing fw_devlink stuff.
> >
> > But long story short, it is the right thing to keep a supplier on
> > indefinitely if there's a consumer device (that's not disabled in DT)
> > that never gets probed. It's a pretty common scenario -- for example,
> > say a display backlight. The default case should be functional
> > correctness. And then we can add stuff that allows changing this
> > behavior with command line args or something else that can be done
> > from userspace.
> >
> > +1 to what Doug said elsewhere in this thread too. I'm trying to
> > consolidate the "when do we give up" decision at the driver core level
> > independent of what framework is being used.
>
> I'm not really sure I agree with the above, at least not without lots
> of discussion in the community. It really goes against what the kernel
> has been doing for years and years in the regulator and clock
> frameworks. Those frameworks both eventually give up and power down
> resources that no active drivers are using. Either changing the
> regulator/clock frameworks or saying that other frameworks should work
> in an opposite way seems like a recipe for confusion.
>
> Now, certainly I won't say that the way that the regulator and clock
> frameworks function is perfect nor will I say that they don't cause
> any problems. However, going the opposite way where resources are kept
> at full power indefinitely will _also_ cause problems.
>
> Specifically, let's look at the case you mentioned of a display
> backlight. I think you're saying that if there is no backlight driver
> enabled in the kernel that you'd expect the backlight to just be on at
> full brightness.

No, I'm not saying that.

> Would you expect this even if the firmware didn't
> leave the backlight on?

sync_state() never turns on anything that wasn't already on at boot.
So in your example, if the firmware didn't turn on the backlight, then
it'll remain off.

> In any case, why do you say it's more correct?

Because if you turn off the display, the device is unusable. In other
circumstances, it can crash a device because the firmware powered it
on left it in a "good enough" state, but we'd go turn it off and crash
the system.

> I suppose you'd say that the screen is at least usable like this.
> ...except that you've broken a different feature: suspend/resume.

If the display is off and the laptop is unusable, then we have bigger
problems than suspend/resume?

> Without being able to turn the backlight off at suspend time the
> device would drain tons of power. It could also overheat when you
> stuffed it in your backpack and damage the battery or start a fire.
> Even if you argue that in the case of the display backlight you're
> better off, what about a keyboard backlight? It's pretty easy to use a
> laptop without the keyboard backlight and if you didn't have a driver
> for it you'd be in better shape leaving it off instead of leaving it
> on 100% of the time, even when the device is suspended.

I think you are again assuming sync_state() will cause stuff to be
turned on if the firmware didn't leave it on before booting the
kernel. This is not the case.

But let's assume you had the same understanding, then I'd argue that
between the default kernel configuration crashing some systems vs
having power impact on others, I'd prefer the former. The firmware
shouldn't have left the keyboard backlight on if it cared about
suspend/resume.

> Overall: if a kernel isn't configured for a given driver we shouldn't
> be expecting the hardware controlled by that driver to work. The best
> we can hope for is that it's at least in a low power state.
>
> In general I think that having a well-defined way to know it's time to
> give up and power off anything for which a driver didn't probe needs
> to be an important part of any designs here.

Btw, the current compromise for deferred probes/optional suppliers is
"keep extending the timeout by 10 seconds as long as modules are being
loaded".

As I said in my earlier email, this is just what I think it should be
like and there's still stuff to figure out before I send out a patch
like that. For example, we could have a sysfs file to write to to
release sync_state() for a device. Then you'd just echo to that file
in your example and go about your day.

-Saravana