Re: [REGRESSION 4.17] Spurious wakeup / reboot with power button

From: Takashi Iwai
Date: Fri Jul 06 2018 - 02:55:39 EST


On Fri, 06 Jul 2018 07:18:36 +0200,
Thomas H4nig wrote:
>
>
>
> Am 05.07.2018 um 18:56 schrieb Takashi Iwai:
> > On Thu, 05 Jul 2018 18:02:11 +0200,
> > Rafael J. Wysocki wrote:
> >>
> >> [The Lv's address is not valid any more, so drop it from the CC]
> >>
> >> On Thursday, July 5, 2018 5:10:20 PM CEST Rafael J. Wysocki wrote:
> >>> On Thu, Jul 5, 2018 at 5:09 PM, Takashi Iwai <tiwai@xxxxxxx> wrote:
> >>>> On Thu, 05 Jul 2018 16:00:14 +0200,
> >>>> Thomas H4nig wrote:
> >>>>>
> >>>>> Am 05.07.2018 um 14:12 schrieb Takashi Iwai:
> >>>>>> On Thu, 05 Jul 2018 12:41:03 +0200,
> >>>>>> Rafael J. Wysocki wrote:
> >>>>>>>
> >>>>>>> On Thursday, July 5, 2018 11:50:11 AM CEST Takashi Iwai wrote:
> >>>>>>>> On Thu, 05 Jul 2018 11:34:59 +0200,
> >>>>>>>> Rafael J. Wysocki wrote:
> >>>>>>>>>
> >>>>>>>>> Hi,
> >>>>>>>>>
> >>>>>>>>> On Thu, Jul 5, 2018 at 9:05 AM, Takashi Iwai <tiwai@xxxxxxx> wrote:
> >>>>>>>>>> Hi,
> >>>>>>>>>>
> >>>>>>>>>> we've got a regression report since 4.17 about the behavior of
> >>>>>>>>>> power-off with the power button. When a machine is powered off with
> >>>>>>>>>> the power button on desktop, it reboots after a few seconds instead of
> >>>>>>>>>> power down.
> >>>>>>>>>>
> >>>>>>>>>> The manual power down via "systemctl poweroff" works fine, so it's
> >>>>>>>>>> possibly some spurious wakeup by the power button action, and some
> >>>>>>>>>> ACPI-related change is suspected.
> >>>>>>>>>> The regression still remains in 4.18-rc3.
> >>>>>>>>>
> >>>>>>>>> There are only a few ACPI commits directly related to power management
> >>>>>>>>> between 4.16 and 4.17 and none of them looks particularly suspicious.
> >>>>>>>>
> >>>>>>>> OK, interesting.
> >>>>>>>>
> >>>>>>>>> It looks like the power button state may not be cleared sufficiently
> >>>>>>>>> after it's been pressed which is now visible for some reason.
> >>>>>>>>
> >>>>>>>> Hmm, where can such a state remain? Since it happens after the
> >>>>>>>> machine turned off, some (ACPI) wakeup bits?
> >>>>>>>
> >>>>>>> Basically, yes.
> >>>>>>>
> >>>>>>> It looks like a GPE may remain active which then triggers wakeup after
> >>>>>>> shutdown.
> >>>>>>>
> >>>>>>> On a hunch, I'm wondering if reverting commit
> >>>>>>>
> >>>>>>> 18996f2db918 ACPICA: Events: Stop unconditionally clearing ACPI IRQs during suspend/resume
> >>>>>>>
> >>>>>>> (may not revert clearly, though) makes any difference.
> >>>>>>
> >>>>>> OK, I'm building a 4.17.x test kernel with that revert, in OBS
> >>>>>> home:tiwai:bsc1099930 repo.
> >>>>>>
> >>>>>> Thomas, could you try later the kernel in
> >>>>>> http://download.opensuse.org/repositories/home:/tiwai:/bsc1099930/standard/
> >>>>>> ? It'll take an hour or so until the build finishes.
> >>>>>
> >>>>> With your new built kernel
> >>>>> 4.17.4-1.g6f23755-default
> >>>>>
> >>>>> the power button works again, so the revert solved the problem
> >>>>
> >>>> Thanks, that clarifies the cause.
> >>>> Adding Erik and Lv to Cc.
> >>>>
> >>>> I guess it's the side-effect by removing
> >>>> acpi_ev_walk_gpe_list(acpi_hw_clear_gpe_block, NULL);
> >>>> in acpi_hw_disable_all_gpes().
> >>>>
> >>>> This function is called from acpi_power_off_prepare(), and the machine
> >>>> goes to power off without clearing the GPEs, hence it's woken up later
> >>>> unexpectedly.
> >>>
> >>> That's correct.
> >>>
> >>> We need to fix up that commit. I'll try to prepare something.
> >>>
> >>
> >> Below is a patch to test that theory and maybe fix things if it is correct.
> >>
> >> What it does is to clear all GPEs after disabling them in
> >> acpi_power_off_prepare() which should address the issue if our theory
> >> about the underlying reason is correct.
> >>
> >> Please test.
> >
> > OK, building a new test kernel package in OBS home:tiwai:bsc1099930-2
> > repo. It'll appear at
> > http://download.opensuse.org/repositories/home:/tiwai:/bsc1099930-2/standard/
> >
> > Thomas, please give it a try later.
> >
> >
> > thanks,
> >
> > Takashi
>
> I am sorry, but with your test kernel 4.17.4-1.g76c6238-default the
> notebook again gets not properly powered off but restarts

Interesting. The package version shows that the tested kernel must be
the right one. (Though, it'd be good to double-check -- it's often
confusing if you have multiple same kernel versions on the system.)

If Rafael's patch doesn't work, we'd need to identify which change in
the commit 18996f2db918 has the effect.

Thomas, could you try to build & modify the kernel in your side?
Package build on OBS and test takes too long.

There are three places that remove the GPE-clearing in the patch.

One is in acpi_ev_enable_gpe():

--- a/drivers/acpi/acpica/evgpe.c
+++ b/drivers/acpi/acpica/evgpe.c
@@ -115,13 +115,6 @@ acpi_status acpi_ev_enable_gpe(struct acpi_gpe_event_info *gpe_event_info)

ACPI_FUNCTION_TRACE(ev_enable_gpe);

- /* Clear the GPE (of stale events) */
-
- status = acpi_hw_clear_gpe(gpe_event_info);
- if (ACPI_FAILURE(status)) {
- return_ACPI_STATUS(status);
- }
-
/* Enable the requested GPE */

status = acpi_hw_low_set_gpe(gpe_event_info, ACPI_GPE_ENABLE);


... the second one is in acpi_hw_disable_all_gpes():

--- a/drivers/acpi/acpica/hwgpe.c
+++ b/drivers/acpi/acpica/hwgpe.c
@@ -497,7 +497,6 @@ acpi_status acpi_hw_disable_all_gpes(void)
ACPI_FUNCTION_TRACE(hw_disable_all_gpes);

status = acpi_ev_walk_gpe_list(acpi_hw_disable_gpe_block, NULL);
- status = acpi_ev_walk_gpe_list(acpi_hw_clear_gpe_block, NULL);
return_ACPI_STATUS(status);
}


... and the last one is in acpi_hw_legacy_sleep():

--- a/drivers/acpi/acpica/hwsleep.c
+++ b/drivers/acpi/acpica/hwsleep.c
@@ -85,15 +85,8 @@ acpi_status acpi_hw_legacy_sleep(u8 sleep_state)
return_ACPI_STATUS(status);
}

- /* Clear all fixed and general purpose status bits */
-
- status = acpi_hw_clear_acpi_status();
- if (ACPI_FAILURE(status)) {
- return_ACPI_STATUS(status);
- }
-
/*
* 1) Disable/Clear all GPEs


The rest are only comments changes.

Try to revert each hunk manually one-by-one and figure out which one
actually causes the regression. Maybe it'd be safer to test on the
4.17.x kernel, but you can check on 4.18-rc, too.


thanks,

Takashi