Re: [PATCH net v1 2/2] lan743x: boost performance: limit PCIe bandwidth requirement

From: Sven Van Asbroeck
Date: Tue Dec 08 2020 - 22:50:32 EST


On Tue, Dec 8, 2020 at 6:36 PM Florian Fainelli <f.fainelli@xxxxxxxxx> wrote:
>
> dma_sync_single_for_{cpu,device} is what you would need in order to make
> a partial cache line invalidation. You would still need to unmap the
> same address+length pair that was used for the initial mapping otherwise
> the DMA-API debugging will rightfully complain.

I tried replacing
dma_unmap_single(9K, DMA_FROM_DEVICE);
with
dma_sync_single_for_cpu(received_size=1500 bytes, DMA_FROM_DEVICE);
dma_unmap_single_attrs(9K, DMA_FROM_DEVICE, DMA_ATTR_SKIP_CPU_SYNC);

and that works! But the bandwidth is still pretty bad, because the cpu
now spends most of its time doing
dma_map_single(9K, DMA_FROM_DEVICE);
which spends a lot of time doing __dma_page_cpu_to_dev.

When I try and replace that with
dma_map_single_attrs(9K, DMA_FROM_DEVICE, DMA_ATTR_SKIP_CPU_SYNC);
Then I get lots of dropped packets, which seems to indicate data corruption.

Interestingly, when I do
dma_map_single_attrs(9K, DMA_FROM_DEVICE, DMA_ATTR_SKIP_CPU_SYNC);
dma_sync_single_for_{cpu|device}(9K, DMA_FROM_DEVICE);
then the dropped packets disappear, but things are still very slow.

What am I missing?