Re: [PATCH net v1 2/2] lan743x: boost performance: limit PCIe bandwidth requirement

From: Sven Van Asbroeck
Date: Wed Dec 16 2020 - 19:58:38 EST


Hi Andrew,

On Wed, Dec 9, 2020 at 9:10 AM Andrew Lunn <andrew@xxxxxxx> wrote:
>
> 9K is not a nice number, since for each allocation it probably has to
> find 4 contiguous pages. See what the performance difference is with
> 2K, 4K and 8K. If there is a big difference, you might want to special
> case when the MTU is set for jumbo packets, or check if the hardware
> can do scatter/gather.
>
> You also need to be careful with caches and speculation. As you have
> seen, bad things can happen. And it can be a lot more subtle. If some
> code is accessing the page before the buffer and gets towards the end
> of the page, the CPU might speculatively bring in the next page, i.e
> the start of the buffer. If that happens before the DMA operation, and
> you don't invalidate the cache correctly, you get hard to find
> corruption.

Thank you for the guidance. When I keep the 9K buffers, and sync
only the buffer space that is being used (mtu when mapping, received
packet size when unmapping), then there is no more corruption, and
performance improves. But setting the buffer size to the mtu size
still provides much better performance. I do not understand why
(yet).

It seems that caching and dma behaviour/performance on arm32
(armv7) is very different compared to x86.