Re: Strange network timeouts w/ 2.6.30.5

From: Krzysztof Halasa
Date: Thu Aug 20 2009 - 16:28:47 EST


David Miller <davem@xxxxxxxxxxxxx> writes:

> swiotlb emulates what hardware does, so if it can go wrong with
> swiotlb it can go wrong with hardware to.
>
> Figure out what the exact bug is.

I think I already have.

The exact bug is using streaming allocations for the descriptor.
It can't work consistently on all platforms, period. Streaming
allocation can only have one owner (either CPU or device) at a time, and
e100 driver wants access (for examining desc status) simultaneously with
the hardware (which may alter desc status at any time).

On ARM with the previous patch applied it can work because the CPU cache
has the "dirty" bits (e100 driver only reads from the descriptors).
On x86 without swiotlb it can work because streaming allocations are
already coherent.
On x86 with swiotlb it can't really work reliably (and if does, it does
by pure luck) because (I guess) swiotlb has no "dirty" flag and can't
know when it doesn't need to flush.

There is no other fix than to convert the desc rings to coherent
allocs. I'm going to do precisely that in few days, but we're stuck with
the existing code in 2.6.31 (and 2.6.30.x etc).
--
Krzysztof Halasa
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/