Re: [RFC] readX_check() - Interface for PCI-X error recovery

From: Grant Grundler
Date: Tue Apr 06 2004 - 21:10:15 EST


On Tue, Apr 06, 2004 at 08:04:49PM +0900, Hidetoshi Seto wrote:
...
> - Resources newly required:
>
> on struct device:
> error flag
> list of recoverable physical address regions

I think willy already pointed out the struct pci_dev->resources[] should point
to MMIO adress ranges the PCI(-X) device "owns".

> pointer to host bridge of the device

ditto. I think willy meant walk pci_dev->bus->parent until bus->self
is NULL normally should work.

> on per_cpu:
> list of currently checking devices

Why wouldn't this be per PCI host bus controller instead of per CPU?

I'm thinking SMP - any CPU could access any MMIO region.
How those MMIO addresses get routed depends on how PCI Host Bus
controllers are programmed. Ie Bridges have to route MMIO transactions
to children (PCI devices).

> - Interfaces newly required:
>
> clear_pcix_errors(dev)
> Clear the error flag of the dev, and start to check the device.
> This also clears the status register of its host bridge.
>
> readX_check(dev,vaddr)
> Read a register of the device mapped to vaddr, and check errors
> if possible(This is depending on its architecture. In the case of
> ia64, we can generate a MCA from an error by simple operation to
> test the read data.)
> If any error happen on the recoverable region, set the error flag.
>
> read_pcix_errors(dev)
> Return whether here was an error or not referring the error flag in
> struct device and the status register of the host bridge, and stop
> checking of the device.
>
> register_pcix_region(dev,paddr,len)
> unregister_pcix_region(dev)
> Register/Unregister a physical address region that the driver can
> recover. This function would be in the init/cleanup of the driver.

Could register/unregister functionality be part of the clear/read errors?

>
>
> - Flow
>
> xx_probe(dev, ...)
> {
> ..
> register_pcix_region(dev,paddr,len);
> ..
> }
>
> xx_intr()
> {
> spin_lock_irqsave(); /* disable interrupt & preemption */
>
> clear_pcix_errors(dev); /* clear error flag */
> {
> ..
> x = readX_check(dev, vaddr);
> ..
> }
> error = read_pcix_errors(dev); /* read error flag */
> if (error)
> recovery_or_offline(dev);
>
> spin_lock_irqrestore(); /* enable interrupt & preemption */

Holding the spinlock might not be realistic for most drivers
during a recovery - ie re-init the HW. I understand this is a
generic example but it's a bit too simple...


...
> on other arch:
> if it is not possible on the arch, do like:
> #ifndef CONFIG_PCI_RECOVERY
> #define clear_pcix_errors(dev) do { } while (0)
> #define read_pcix_errors(dev) (0)
> #endif

And need similar for the register/unregister/readX_check() calls.

thanks,
grant
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/