Error reporting API, any work in progress ?

From: Benjamin Herrenschmidt
Date: Tue Feb 01 2005 - 21:14:16 EST


Hi !

Is there any work in progress to get some unified error reporting (and
possibly recovery) API for PCI/X/e ?

On pSeries, we have this "EEH" mecanism that allows us, even with
old-style PCI, to get error notification, but also to recover by doing
slot reset and that sort of stuff.

PCI Express has error reporting at least, IBM's implementation will have
similar reset control for recovery.

So I think we should need some generic error reporting capability to be
added to the PCI layer, possibly by adding a function pointer to
pci_driver() that gets called back to report various events, in addition
to the proposed explicit error checking that can bracket IO calls like
it was discussed a while ago on lkml.

In addition, I was thinking about some generic routines the drivers
could call to indicate, upon such errors, that they are willing to
recover after a slot reset. That way, the platform can notify all
drivers of a bus segment that went off, reset the slot, and send
messages to those drivers telling them the slot is back.

If there is already some work in progress for that (PCI driver-level API
for notification), I'd be happy to help there. If not, then I'll propose
something that would fit the need of IBM EEH mecanism and PCI express.

Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/