Re: [PATCH v1 0/3] introduce priority-based shutdown support

From: Mark Brown
Date: Sun Nov 26 2023 - 05:14:51 EST


On Sat, Nov 25, 2023 at 07:58:12PM +0000, Greg Kroah-Hartman wrote:
> On Sat, Nov 25, 2023 at 03:43:02PM +0000, Mark Brown wrote:
> > On Sat, Nov 25, 2023 at 02:35:41PM +0000, Greg Kroah-Hartman wrote:

> > > That would be great, but I don't see that here, do you? All I see is
> > > the shutdown sequence changing because someone wants it to go "faster"
> > > with the threat of hardware breaking if we don't meet that "faster"
> > > number, yet no knowledge or guarantee that this number can ever be known
> > > or happen.

> > The idea was to have somewhere to send notifications when the hardware
> > starts reporting things like power supplies starting to fail. We do
> > have those from hardware, we just don't do anything terribly useful
> > with them yet.

> Ok, but that's not what I recall this patchset doing, or did I missing
> something? All I saw was a "reorder the shutdown sequence" set of
> changes. Or at least that's all I remember at this point in time,
> sorry, it's been a few days, but at least that lines up with what the
> Subject line says above :)

That's not in the series, a bunch of it is merged in some form (eg, see
hw_protection_shutdown()) and more of it would need to be built on top
if this were merged.

> > > Agreed, but I don't think this patch is going to actually work properly
> > > over time as there is no time values involved :)

> > This seems to be more into the area of mitigation than firm solution, I
> > suspect users will be pleased if they can make a noticable dent in the
> > number of failures they're seeing.

> Mitigation is good, but this patch series is just a hack by doing "throw
> this device type at the front of the shutdown list because we have
> hardware that crashes a lot" :)

Sounds like a mitigation to me.

> > It feels like if we're concerned about mitigating physical damage during
> > the process of power failure that's a very limited set of devices - the
> > storage case where we're in the middle of writing to flash or whatever
> > is the most obvious case.

> Then why isn't userspace handling this? This is a policy decision that
> it needs to take to properly know what hardware needs to be shut down,
> and what needs to happen in order to do that (i.e. flush, unmount,
> etc.?) And userspace today should be able to say, "power down this
> device now!" for any device in the system based on the sysfs device
> tree, or at the very least, force it to a specific power state. So why
> not handle this policy there?

Given the tight timelines it does seem reasonable to have some of this
in the kernel - the specific decisions about how to handle these events
can always be controlled from userspace (eg, with a sysfs file like we
do for autosuspend delay times which seem to be in a similar ballpark).

Attachment: signature.asc
Description: PGP signature