Re: [PATCH 1/6] PCIERR : interfaces for synchronous I/O error detection on driver

From: Linas Vepstas
Date: Fri Mar 24 2006 - 18:40:33 EST


On Fri, Mar 24, 2006 at 04:47:25PM +0900, Hidetoshi Seto wrote:
>
> At 2.6.16-rc1, Linux kernel accepts and provides "PCI-bus error event
> callbacks" that enable RAS-aware drivers to notice errors asynchronously,
> and to join following kernel-initiated PCI-bus error recovery.
> This callbacks work well on PPC64 where it was designed to fit.
>
> However, some difficulty still remains to cover all possible error
> situations even if we use callbacks. It will not help keeping data
> integrity, passing no broken data to drivers and user lands, preventing
> applications from going crazy or sudden death.

This is not true. Although there are some subtle issues, (which
I invite you to describe), the goal of the current design is to
insure data integrity, and make sure that neither the driver nor
the userland gets corrupted data. There shouldn't be any "crazy
or sudden death" if the device drivers are any good.

Of course, this depends on the hardware implementation. If
your PCI bus sends corrupt data up to the driver ... all bets
are off. The design is predicated on the assumption that the
hardware sends either good data or no data, ad that the latter
is associated with a bus state indicating an error has ocurred.

> - It will be useful if arch chooses panic on bus errors not to pass
> any broken data to un-reliable drivers.

I assume you meant "if arch chooses NOT to panic on bus errors ..."

I'll review the rest of the patch via sepaate email

--linas
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/