Re: [RFC 0/3] acpipcihp: fix kernel crash on 2nd resume

From: Woody Suwalski
Date: Tue Jul 25 2023 - 12:00:21 EST


Igor Mammedov wrote:
On Tue, 25 Jul 2023 09:51:53 -0400
Woody Suwalski <terraluna977@xxxxxxxxx> wrote:

Igor Mammedov wrote:
Changelog:
* split out debug patch into a separate one with extra printk added
* fixed inverte bus->self check (probably a reason why it didn't work before)


1/3 debug patch
2/3 offending patch
3/3 potential fix
I added more files to trace, add following to kernel CLI
dyndbg="file drivers/pci/access.c +p; file drivers/pci/hotplug/acpiphp_glue.c +p; file drivers/pci/bus.c +p; file drivers/pci/pci.c +p; file drivers/pci/setup-bus.c +p; file drivers/acpi/bus.c +p" ignore_loglevel

should be applied on top of
e8afd0d9fccc PCI: pciehp: Cancel bringup sequence if card is not present

apply a patch one by one and run testcase + capture dmesg after each patch
one shpould endup with 3 dmesg to ananlyse
1st - old behaviour - no crash
2nd - crash
3rd - no crash hopefully

Igor Mammedov (3):
acpiphp: extra debug hack
PCI: acpiphp: Reassign resources on bridge if necessary
acpipcihp: use __pci_bus_assign_resources() if bus doesn't have bridge

drivers/pci/hotplug/acpiphp_glue.c | 23 ++++++++++++++++++-----
1 file changed, 18 insertions(+), 5 deletions(-)
Actually applying patch1 is already creating the crash (why???),
probably it's due to an extra debug line, I've added.
I dropped suspicions one, can you try again and see if it works.

hence I
have added also dmesg-6.5-0.txt which shows a working condition based on
git e8afd0d9fccc level (acpiphp_glue in kernel 6.4)

Patch3 did not fix the issue, it seems that the culprit is somewhere
else triggered by  "benign" patch1 :-(

Also note about the trigger description in patch3: the dmesg trace on
Inspiron laptop is collected after the first wake from suspend to ram.
The consecutive  attempt to sleep results in a frozen system.
Thanks for clarification, I'll correct commit message once culprit
is found.

Good news. After removing the botched debug statement which was masking the original issue, the testing went as you have predicted, and on patch 3 system suspends to RAM OK.

Here are the requested 3 dmesg outputs, #2 is for the bad run.

I can retest with a final version of the patch once you have it ready...

Thanks, Woody

Attachment: rfc1.tar.xz
Description: application/xz