Re: power-off delay/hang due to commit 6d25be57 (mainline)

From: Rafael J. Wysocki
Date: Sat Jan 02 2021 - 06:10:14 EST


On Thursday, December 31, 2020 9:46:11 PM CET Rafael J. Wysocki wrote:
> On Wednesday, December 2, 2020 8:13:38 PM CET Rafael J. Wysocki wrote:
> > On Wed, Dec 2, 2020 at 7:31 PM Rafael J. Wysocki <rafael@xxxxxxxxxx> wrote:
> > >
> > > On Wed, Dec 2, 2020 at 7:03 PM Sebastian Andrzej Siewior
> > > <bigeasy@xxxxxxxxxxxxx> wrote:
> > > >
> > > > On 2020-10-26 18:20:59 [+0100], To Rafael J. Wysocki wrote:
> > > > > > > > > Done as Bug 208877.
> > > > > > > Rafael, do you have any suggestions?
> > > > > >
> > > > > > I've lost track of this sorry.
> > > > > >
> > > > > > I have ideas, let me get back to this next week.
> > > > >
> > > > > :)
> > > >
> > > > Rafael, any update? If you outline an idea or so then I may be able to
> > > > form a patch out of it. Otherwise I have no idea how to fix this - other
> > > > than telling the driver to not poll in smaller intervals than
> > > > 30secs.
> > >
> > > The idea, roughly speaking, is to limit the number of outstanding work
> > > items in the queue (basically, if there's a notification occurring
> > > before the previous one can be handled, there is no need to queue up
> > > another work item for it).
> >
> > That's easier said than done, though, because of the way the work item
> > queue-up is hooked up into the ACPICA code.
>
> So scratch this and it wouldn't work in general anyway AFAICS.
>
> ATM, I'm tempted to do something like the patch below (with the rationale
> that it shouldn't be necessary to read the temperature right after updating
> the trip points if polling is in use, because the next update through polling
> will cause it to be read anyway and it will trigger trip point actions as
> needed).

There is one more way to address this, probably better: instead of checking the
temperature right away in acpi_thermal_notify(), queue that on acpi_thermal_pm_queue
and so only if another thermal check is not pending.

This way there will be at most one temperature check coming from
acpi_thermal_notify() queued up at any time which should prevent the
build-up of work items from taking place.

So something like this:

---
drivers/acpi/thermal.c | 14 ++++++++++----
1 file changed, 10 insertions(+), 4 deletions(-)

Index: linux-pm/drivers/acpi/thermal.c
===================================================================
--- linux-pm.orig/drivers/acpi/thermal.c
+++ linux-pm/drivers/acpi/thermal.c
@@ -900,6 +900,12 @@ static void acpi_thermal_unregister_ther
Driver Interface
-------------------------------------------------------------------------- */

+static void acpi_queue_thermal_check(struct acpi_thermal *tz)
+{
+ if (!work_pending(&tz->thermal_check_work))
+ queue_work(acpi_thermal_pm_queue, &tz->thermal_check_work);
+}
+
static void acpi_thermal_notify(struct acpi_device *device, u32 event)
{
struct acpi_thermal *tz = acpi_driver_data(device);
@@ -910,17 +916,17 @@ static void acpi_thermal_notify(struct a

switch (event) {
case ACPI_THERMAL_NOTIFY_TEMPERATURE:
- acpi_thermal_check(tz);
+ acpi_queue_thermal_check(tz);
break;
case ACPI_THERMAL_NOTIFY_THRESHOLDS:
acpi_thermal_trips_update(tz, ACPI_TRIPS_REFRESH_THRESHOLDS);
- acpi_thermal_check(tz);
+ acpi_queue_thermal_check(tz);
acpi_bus_generate_netlink_event(device->pnp.device_class,
dev_name(&device->dev), event, 0);
break;
case ACPI_THERMAL_NOTIFY_DEVICES:
acpi_thermal_trips_update(tz, ACPI_TRIPS_REFRESH_DEVICES);
- acpi_thermal_check(tz);
+ acpi_queue_thermal_check(tz);
acpi_bus_generate_netlink_event(device->pnp.device_class,
dev_name(&device->dev), event, 0);
break;
@@ -1117,7 +1123,7 @@ static int acpi_thermal_resume(struct de
tz->state.active |= tz->trips.active[i].flags.enabled;
}

- queue_work(acpi_thermal_pm_queue, &tz->thermal_check_work);
+ acpi_queue_thermal_check(tz);

return AE_OK;
}