Re: pci: kernel crash in bus_find_device

From: Francesco Ruggeri
Date: Tue May 20 2014 - 18:35:22 EST


Hi Guenter,
thank you for your reply. I will check out the changes that you pointed to.
The problem we are seeing is a race condition between for_each_pci_dev
(or similar) and device_unregisters. I am not sure if use of the new
lock should be extended to all code using for_each_pci_dev as well.

pci_scan is a kernel thread that I used for testing purposes, to
mimick the dynamics that we saw in our crashes in
edac_pci_clear_parity_errors:

for (;;) {
pci_dev = NULL;
while ((pci_dev = pci_get_device(PCI_ANY_ID,
PCI_ANY_ID, pci_dev)) != NULL)
;
}

It keeps traversing klist_devices in pci_bus_type using
bus_find_device, costantly resuming its search for the next element
starting from the one it got in the previous round.
There are several loops of this kind in linux. In case of this thread
no action is taken on the elements as they are "found".

The race condition occurs when bus_find_device resumes its search from
a device that has been unregistered. Because device_unregister resets
klist_bus in the device, bus_find device cannot resume from where it
left off in the klist.
The sequence is device_unregister, device_del, bus_remove_device,
klist_del(&dev->p->knode_bus.).

Francesco
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/