Re: power-off delay/hang due to commit 6d25be57 (mainline)

From: Stephen Berman
Date: Thu Jun 11 2020 - 11:39:56 EST


On Thu, 11 Jun 2020 00:49:26 +0200 Stephen Berman <stephen.berman@xxxxxxx> wrote:

> On Wed, 10 Jun 2020 12:25:14 +0200 Sebastian Andrzej Siewior
> <bigeasy@xxxxxxxxxxxxx> wrote:
> [...]
>>> By the other patch do you mean the following? (This email was also put
>>> into my spam by GMX and AFAICT has not yet shown up on the vger list.)
>>
>> Yes. But now that I played a little with it, I made dis:
> [...]
>> So you should have output like:
>> |[ 14.131696] Flushing events_freezable_power_
>> the flush that hangs
>>
>> |[ 17.203246] Flushing events_freezable_power_ over, 0
>> It has been noticed (0 at the end)
>>
>> |[ 17.203849] Showing busy workqueues and worker pools:
>> |[ 17.204400] workqueue events_freezable_power_: flags=0x84
>> |[ 17.205009] pwq 12: cpus=6 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
>> |[ 17.205774] in-flight: 2147:disk_events_workfn
>> |[ 17.206268] workqueue kblockd: flags=0x18
>> |[ 17.206682] pwq 13: cpus=6 node=0 flags=0x0 nice=-20 active=1/256 refcnt=2
>> |[ 17.207428] pending: blk_mq_requeue_work
>> |[ 17.207958] pool 12: cpus=6 node=0 flags=0x0 nice=0 hung=2s workers=3 idle:
>> | 535 42
>>
>> A dump of all pending worqueue items.
>>
>> |[ 20.275239] Flushing events_freezable_power_ over, 0
>> |[ 20.275812] Showing busy workqueues and worker pools:
>> |[ 20.276385] workqueue events_freezable_power_: flags=0x84
>> |[ 20.276972] pwq 12: cpus=6 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
>> |[ 20.277712] in-flight: 2147:disk_events_workfn
>> |[ 20.278245] workqueue kblockd: flags=0x18
>> |[ 20.278685] pwq 13: cpus=6 node=0 flags=0x0 nice=-20 active=1/256 refcnt=2
>> |[ 20.279471] pending: blk_mq_requeue_work
>> |[ 20.279989] pool 12: cpus=6 node=0 flags=0x0 nice=0 hung=5s workers=3 idle:
>> | 535 42
>>
>> Another atempt. As you see `blk_mq_requeue_work' is references in both
>> outputs but it makes progress (it is scheduled multiple times). The item
>> `disk_events_workfn' stucks.
>
> I rebuilt kernel 5.6.4 with that patch, but before calling `shutdown -h
> now' I guess I should have set printk_delay, because this time there was
> a lot of output and it went by too fast. I'll try to delay it the next
> time.

I've done that now. I've sent you screenshots offlist. Here's a brief
summary: The initial shutdown log output is essentially the same as the
transcription I posted upthread, except that it is interspersed with
messages from your patch like "Flush hid_reset+0x0/0x80". Then comes:

ACPI: Preparing to enter system sleep state S5
Flushing kacpid
Flushing kacpid_notify
Flushing kacpid_notify over, 0
Showing busy workqueues and worker pools:
workqueue events: flags=0x0
pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
pending: drm_fb_helper_diry_work
workqueue kacpid_notify: flags=0x0
pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/1 refcnt=4185
in-flight: 1578:acpi_os_execute_deferred
delayed: acpi_os_execute_deferred, acpi_os_execute_deferred, ...

The "acpi_os_execute_deferred" messages were repeated many times in the
above line, then every 20-30 seconds again for several minutes. Then
suddenly a call trace appeared which was similar but not identical to
the one I posted upthread, and each line of the trace was followed by
the line ", acpi_os_execute_deferred". This went by quite quickly even
with the printk_delay I added, and I was unable to photograph the start
of it and couldn't get all of the subsequent output, but the screenshots
show some of it. After several minutes of this output I again did a
hard reboot.

Steve Berman