Re: [RFC PATCH v2 1/2] PM: domains: Skip disabling unused domains if provider has sync_state

From: Doug Anderson
Date: Tue Feb 07 2023 - 18:45:58 EST


Hi,

On Mon, Feb 6, 2023 at 1:35 PM Saravana Kannan <saravanak@xxxxxxxxxx> wrote:
>
> On Mon, Feb 6, 2023 at 1:10 PM Doug Anderson <dianders@xxxxxxxxxxxx> wrote:
> >
> > Hi,
> >
> > On Mon, Feb 6, 2023 at 11:33 AM Saravana Kannan <saravanak@xxxxxxxxxx> wrote:
> > >
> > > On Mon, Feb 6, 2023 at 9:48 AM Abel Vesa <abel.vesa@xxxxxxxxxx> wrote:
> > > >
> > > >
> > > > CC'ed Saravana
> > >
> > > Thanks. Please do cc me for stuff like this from the start. I skimmed
> > > the series and I think it's doing one of my TODO items. So, thanks for
> > > the patch!
> > >
> > > I'll take a closer look within a few days -- trying to get through
> > > some existing fw_devlink stuff.
> > >
> > > But long story short, it is the right thing to keep a supplier on
> > > indefinitely if there's a consumer device (that's not disabled in DT)
> > > that never gets probed. It's a pretty common scenario -- for example,
> > > say a display backlight. The default case should be functional
> > > correctness. And then we can add stuff that allows changing this
> > > behavior with command line args or something else that can be done
> > > from userspace.
> > >
> > > +1 to what Doug said elsewhere in this thread too. I'm trying to
> > > consolidate the "when do we give up" decision at the driver core level
> > > independent of what framework is being used.
> >
> > I'm not really sure I agree with the above, at least not without lots
> > of discussion in the community. It really goes against what the kernel
> > has been doing for years and years in the regulator and clock
> > frameworks. Those frameworks both eventually give up and power down
> > resources that no active drivers are using. Either changing the
> > regulator/clock frameworks or saying that other frameworks should work
> > in an opposite way seems like a recipe for confusion.
> >
> > Now, certainly I won't say that the way that the regulator and clock
> > frameworks function is perfect nor will I say that they don't cause
> > any problems. However, going the opposite way where resources are kept
> > at full power indefinitely will _also_ cause problems.
> >
> > Specifically, let's look at the case you mentioned of a display
> > backlight. I think you're saying that if there is no backlight driver
> > enabled in the kernel that you'd expect the backlight to just be on at
> > full brightness.
>
> No, I'm not saying that.
>
> > Would you expect this even if the firmware didn't
> > leave the backlight on?
>
> sync_state() never turns on anything that wasn't already on at boot.
> So in your example, if the firmware didn't turn on the backlight, then
> it'll remain off.

As per offline discussion, part of the problems are that today this
_isn't_ true for a few Qualcomm things (like interconnect). The
interconnect frameway specifically maxes things out for early boot.


> > In any case, why do you say it's more correct?
>
> Because if you turn off the display, the device is unusable. In other
> circumstances, it can crash a device because the firmware powered it
> on left it in a "good enough" state, but we'd go turn it off and crash
> the system.
>
> > I suppose you'd say that the screen is at least usable like this.
> > ...except that you've broken a different feature: suspend/resume.
>
> If the display is off and the laptop is unusable, then we have bigger
> problems than suspend/resume?

I suspect that here we'll have to agree to disagree. IMO it's a
non-goal to expect hardware to work for which there is no driver. So
making the backlight work without a backlight driver isn't really
something we should strive for.


> > Without being able to turn the backlight off at suspend time the
> > device would drain tons of power. It could also overheat when you
> > stuffed it in your backpack and damage the battery or start a fire.
> > Even if you argue that in the case of the display backlight you're
> > better off, what about a keyboard backlight? It's pretty easy to use a
> > laptop without the keyboard backlight and if you didn't have a driver
> > for it you'd be in better shape leaving it off instead of leaving it
> > on 100% of the time, even when the device is suspended.
>
> I think you are again assuming sync_state() will cause stuff to be
> turned on if the firmware didn't leave it on before booting the
> kernel. This is not the case.
>
> But let's assume you had the same understanding, then I'd argue that
> between the default kernel configuration crashing some systems vs
> having power impact on others, I'd prefer the former. The firmware
> shouldn't have left the keyboard backlight on if it cared about
> suspend/resume.

The keylight is a bit of a contrived example, of course. ...but not
that contrived. It's entirely possible that the keyboard backlight is
controlled by a GPIO and that the default state of that GPIO at bootup
enables the backlight regulator. That would mean that the firmware
"left" the keyboard backlight on. The firmware's job is not to init
all hardware. It's to init whatever hardware was needed to boot the
kernel and then get out of the way and boot the kernel. Ideally the
kernel should assume as little about the firmware as possible except
in cases where the firmware actually needs to hand something off to
the kernel (serial console, boot splash, etc).


> > Overall: if a kernel isn't configured for a given driver we shouldn't
> > be expecting the hardware controlled by that driver to work. The best
> > we can hope for is that it's at least in a low power state.
> >
> > In general I think that having a well-defined way to know it's time to
> > give up and power off anything for which a driver didn't probe needs
> > to be an important part of any designs here.
>
> Btw, the current compromise for deferred probes/optional suppliers is
> "keep extending the timeout by 10 seconds as long as modules are being
> loaded".
>
> As I said in my earlier email, this is just what I think it should be
> like and there's still stuff to figure out before I send out a patch
> like that. For example, we could have a sysfs file to write to to
> release sync_state() for a device. Then you'd just echo to that file
> in your example and go about your day.

We don't need to get into a centi-thread here, but I'll at least say
that it's my opinion that we need some way to get the same type of
behavior that the existing regulator / clock frameworks have. That is:
if there are resources that no driver has enabled that there should be
some way to get them to shut off eventually.

-Doug