Re: MSI fix for buggy PCI/PCI-X hardware

From: Jeff Garzik
Date: Tue Sep 09 2003 - 17:56:23 EST


Nakajima, Jun wrote:
How about the default behavior? I'm not a fan of disable_msi(), because
we need to update the driver as we find problems, and we cannot predict
which PCI/PCI-X devices in the world have such a problem, although we
know some will. The workaround in drivers/pci/quirk.c is much better,
compared to modifying the driver, but we still need to update the file
(and rebuild the kernel) as we find problems.

Agreed.

That's the pain of buggy hardware. The solution is to not produce buggy hardware ;-) Failing that, it is unavoidable that the kernel would need to be updated to notice or work around buggy hardware. That's precisely the reason for quirks/dmi_scan existence: the special cases. Special cases are never easy or enjoyable to maintain ;-)


In my opinion, we might want to use drivers/pci/quirk.c to blacklist PCI
Express devices if any (hope not). For PCI/PCI-X devices, we might want
to enable MSI once verified for it. To that end we can also use
drivers/pci/quirk.c to whitelist them (or it's abuse?). That way we can
avoid situations like "it hangs, it does not get interrupts", "disable
ACPI, oh no, MSI".


Five points here:

1) If we did that with ACPI, you guys would have only recieved a _fraction_ of the feedback you received. IMO we want to turn on MSI (where supported), and see what breaks. It _should_ work, otherwise the hardware guys wouldn't have put MSI on their PCI device :)

You'll never get feedback and testing if it's turned off by default.

2) MSI is more optimal than standard (should I start calling them legacy?) x86 interrupts. And I think they're just plain cool. So of course I will push to default MSI to on! ;-)

3) I think this view is colored by "right now". The current MSI errata may be worrying you, but... MSI is the future. If you choose to whitelist, then you're creating a maintenance nightmare for the future. You would have to qualify _every_ MSI device! Think how much it would suck if we have to do that with PCI devices today.

Furthermore, a whitelist unfairly punishes working MSI hardware and perhaps unfairly highlights a few key vendors at the start ;-) This is why I like blacklists.

Broken hardware is a special case, and not something we should invest a whole lot of time worrying about. _Assume_ the hardware is working, then deal with the cases where it isn't. _That_ is the Linus Torvalds model of an optimal system (IMO :))

4) I have a real-life example: tg3. The BroadCom 57xx chips are MSI-brain-damaged. So we unconditionally program the hardware in non-MSI mode. No special APIs needed at all.

5) Another option is to enable MSI only for devices which call request_msi(). This idea follows the current model of pci_enable_device(): PCI resources and interrupts are guaranteed to be assigned and set up only after a successful call to pci_enable_device(). Then, later on, the driver will call request_irq(), which will unmask the irq (if it's not already shared). Continuing this model, a driver's call to request_msi() would signal that MSI is to be enabled for that device.... and ensure that the PCI core does not unconditionally enable MSI for any device outside of request_msi() call.

Jeff



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/