Re: Strange network timeouts w/ 2.6.30.5

From: Krzysztof Halasa
Date: Thu Aug 20 2009 - 05:03:18 EST


> Since patching to 2.6.30.5 I'm experiencing periodic timeouts on my
> e100 which is used as my WAN interface on a server/router box. Nothing
> is reported in any logs and eventually the traffic resumes. It seems
> to happen at fairly regular intervals, although I've not timed them.
> The timeouts last for approx. 60-120 seconds and then traffic resumes
> normally with no hint of what happened.

x86-64, intel P965...

Can you provide "dmesg" output, please?

I wonder what additional side effect did the patch cause. Streaming
allocs on such x86 should already be coherent, no?

Perhaps you have more than 2 GB RAM (or so) and swiotlb has to provide
buffering? I think of something like:

- the driver does "sync for CPU" and examines status
- the descriptor is tested to be still empty
- meanwhile e100 chip changes the status in the descriptor
- the driver does "sync for device" (it's what the patch added)
- at this point swiotlb doesn't know the descriptor is clean and writes
it out, thus dropping the change done by the e100 chip.

Does the above seem plausible? I admit I'm not swiotlb expert, it's
a pure guess that it simply and blindly moves data in and out.

If that's the case, I don't really know how could it work without the
patch in question. Perhaps the timings were just right?

What can we do with it? Rewriting to use consistent allocs, of course.
Temporarily adding #ifdef CONFIG_ARM around the
pci_dma_sync_single_for_device()? Not sure if other archs were affected.

The root problem is that the driver shouldn't use streaming allocations
for its descriptors (they are written from both sides simultaneously).
Only skb->data can be streaming.
--
Krzysztof Halasa
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/