Re: [PATCH] hack to debug acpiphp crash

From: Woody Suwalski
Date: Mon Jul 24 2023 - 21:52:46 EST


Igor Mammedov wrote:
Woody thanks for testing,

can you try following patch which will try to workaround NULL bus->self if it's
a really cuplrit and print an extra debug information.
Add following to kernel command line(make sure that CONFIG_DYNAMIC_DEBUG is enabled):

dyndbg="file drivers/pci/access.c +p; file drivers/pci/hotplug/acpiphp_glue.c +p; file drivers/pci/bus.c +p; file drivers/pci/pci.c +p; file drivers/pci/setup-bus.c +p" ignore_loglevel

What I find odd in you logs is that enable_slot() is called while native PCIe
should be used. Additional info might help to understand what's going on:
1: 'lspci' output
2: DSDT and all SSDT ACPI tables (you can use 'acpidump -b' to get them).

Signed-off-by: Igor Mammedov <imammedo@xxxxxxxxxx>
---
drivers/pci/hotplug/acpiphp_glue.c | 18 +++++++++++++++++-
1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c
index 328d1e416014..9ce3fd9d72a9 100644
--- a/drivers/pci/hotplug/acpiphp_glue.c
+++ b/drivers/pci/hotplug/acpiphp_glue.c
@@ -485,7 +485,10 @@ static void enable_slot(struct acpiphp_slot *slot, bool bridge)
struct pci_bus *bus = slot->bus;
struct acpiphp_func *func;
+WARN(1, "enable_slot");
+pci_info(bus, "enable_slot bus\n");
if (bridge && bus->self && hotplug_is_native(bus->self)) {
+pr_err("enable_slot: bridge branch\n");
/*
* If native hotplug is used, it will take care of hotplug
* slot management and resource allocation for hotplug
@@ -498,8 +501,10 @@ static void enable_slot(struct acpiphp_slot *slot, bool bridge)
acpiphp_native_scan_bridge(dev);
}
} else {
+ LIST_HEAD(add_list);
int max, pass;
+pr_err("enable_slot: acpiphp_rescan_slot branch\n");
acpiphp_rescan_slot(slot);
max = acpiphp_max_busnr(bus);
for (pass = 0; pass < 2; pass++) {
@@ -508,13 +513,23 @@ static void enable_slot(struct acpiphp_slot *slot, bool bridge)
continue;
max = pci_scan_bridge(bus, dev, max, pass);
+pci_info(dev, "enable_slot: pci_scan_bridge: max: %d\n", max);
if (pass && dev->subordinate) {
check_hotplug_bridge(slot, dev);
pcibios_resource_survey_bus(dev->subordinate);
+ if (bus->self)
+ __pci_bus_size_bridges(dev->subordinate,
+ &add_list);
}
}
}
- pci_assign_unassigned_bridge_resources(bus->self);
+ if (bus->self) {
+pci_info(bus->self, "enable_slot: pci_assign_unassigned_bridge_resources:\n");
+ pci_assign_unassigned_bridge_resources(bus->self);
+ } else {
+pci_info(bus, "enable_slot: __pci_bus_assign_resources:\n");
+ __pci_bus_assign_resources(bus, &add_list, NULL);
+ }
}
acpiphp_sanitize_bus(bus);
@@ -541,6 +556,7 @@ static void enable_slot(struct acpiphp_slot *slot, bool bridge)
}
pci_dev_put(dev);
}
+pr_err("enable_slot: end\n");
}
/**
Unfortunately the patch above does not seem to prevent the kernel crash.
Here comes the requested diagnostic info: dmesg's before and after, choice of lspci's and acpi tables. Hope that will help :-)

Thanks, Woody


Attachment: pcidebug.tar.xz
Description: application/xz