Re: [EXT] Re: [PATCH 1/1] net: fec: add initial XDP support

From: Jesper Dangaard Brouer
Date: Thu Sep 29 2022 - 14:56:23 EST




On 29/09/2022 17.52, Shenwei Wang wrote:

From: Jesper Dangaard Brouer <jbrouer@xxxxxxxxxx>

On 29/09/2022 15.26, Shenwei Wang wrote:

From: Andrew Lunn <andrew@xxxxxxx>
Sent: Thursday, September 29, 2022 8:23 AM
[...]

I actually did some compare testing regarding the page pool for
normal traffic. So far I don't see significant improvement in the
current implementation. The performance for large packets improves a
little, and the performance for small packets get a little worse.

What hardware was this for? imx51? imx6? imx7 Vybrid? These all use the FEC.

I tested on imx8qxp platform. It is ARM64.

On mvneta driver/platform we saw huge speedup replacing:

page_pool_release_page(rxq->page_pool, page); with
skb_mark_for_recycle(skb);

As I mentioned: Today page_pool have SKB recycle support (you might have
looked at drivers that didn't utilize this yet), thus you don't need to release the
page (page_pool_release_page) here. Instead you could simply mark the SKB
for recycling, unless driver does some page refcnt tricks I didn't notice.

On the mvneta driver/platform the DMA unmap (in page_pool_release_page)
was very expensive. This imx8qxp platform might have faster DMA unmap in
case is it cache-coherent.

I would be very interested in knowing if skb_mark_for_recycle() helps on this
platform, for normal network stack performance.


Did a quick compare testing for the following 3 scenarios:

Thanks for doing this! :-)

1. original implementation

shenwei@5810:~$ iperf -c 10.81.16.245 -w 2m -i 1
------------------------------------------------------------
Client connecting to 10.81.16.245, TCP port 5001
TCP window size: 416 KByte (WARNING: requested 1.91 MByte)
------------------------------------------------------------
[ 1] local 10.81.17.20 port 49154 connected with 10.81.16.245 port 5001
[ ID] Interval Transfer Bandwidth
[ 1] 0.0000-1.0000 sec 104 MBytes 868 Mbits/sec
[ 1] 1.0000-2.0000 sec 105 MBytes 878 Mbits/sec
[ 1] 2.0000-3.0000 sec 105 MBytes 881 Mbits/sec
[ 1] 3.0000-4.0000 sec 105 MBytes 879 Mbits/sec
[ 1] 4.0000-5.0000 sec 105 MBytes 878 Mbits/sec
[ 1] 5.0000-6.0000 sec 105 MBytes 878 Mbits/sec
[ 1] 6.0000-7.0000 sec 104 MBytes 875 Mbits/sec
[ 1] 7.0000-8.0000 sec 104 MBytes 875 Mbits/sec
[ 1] 8.0000-9.0000 sec 104 MBytes 873 Mbits/sec
[ 1] 9.0000-10.0000 sec 104 MBytes 875 Mbits/sec
[ 1] 0.0000-10.0073 sec 1.02 GBytes 875 Mbits/sec

2. Page pool with page_pool_release_page

shenwei@5810:~$ iperf -c 10.81.16.245 -w 2m -i 1
------------------------------------------------------------
Client connecting to 10.81.16.245, TCP port 5001
TCP window size: 416 KByte (WARNING: requested 1.91 MByte)
------------------------------------------------------------
[ 1] local 10.81.17.20 port 35924 connected with 10.81.16.245 port 5001
[ ID] Interval Transfer Bandwidth
[ 1] 0.0000-1.0000 sec 101 MBytes 849 Mbits/sec
[ 1] 1.0000-2.0000 sec 102 MBytes 860 Mbits/sec
[ 1] 2.0000-3.0000 sec 102 MBytes 860 Mbits/sec
[ 1] 3.0000-4.0000 sec 102 MBytes 859 Mbits/sec
[ 1] 4.0000-5.0000 sec 103 MBytes 863 Mbits/sec
[ 1] 5.0000-6.0000 sec 103 MBytes 864 Mbits/sec
[ 1] 6.0000-7.0000 sec 103 MBytes 863 Mbits/sec
[ 1] 7.0000-8.0000 sec 103 MBytes 865 Mbits/sec
[ 1] 8.0000-9.0000 sec 103 MBytes 862 Mbits/sec
[ 1] 9.0000-10.0000 sec 102 MBytes 856 Mbits/sec
[ 1] 0.0000-10.0246 sec 1.00 GBytes 858 Mbits/sec


3. page pool with skb_mark_for_recycle

shenwei@5810:~$ iperf -c 10.81.16.245 -w 2m -i 1
------------------------------------------------------------
Client connecting to 10.81.16.245, TCP port 5001
TCP window size: 416 KByte (WARNING: requested 1.91 MByte)
------------------------------------------------------------
[ 1] local 10.81.17.20 port 42724 connected with 10.81.16.245 port 5001
[ ID] Interval Transfer Bandwidth
[ 1] 0.0000-1.0000 sec 111 MBytes 931 Mbits/sec
[ 1] 1.0000-2.0000 sec 112 MBytes 935 Mbits/sec
[ 1] 2.0000-3.0000 sec 111 MBytes 934 Mbits/sec
[ 1] 3.0000-4.0000 sec 111 MBytes 934 Mbits/sec
[ 1] 4.0000-5.0000 sec 111 MBytes 934 Mbits/sec
[ 1] 5.0000-6.0000 sec 112 MBytes 935 Mbits/sec
[ 1] 6.0000-7.0000 sec 111 MBytes 934 Mbits/sec
[ 1] 7.0000-8.0000 sec 111 MBytes 933 Mbits/sec
[ 1] 8.0000-9.0000 sec 112 MBytes 935 Mbits/sec
[ 1] 9.0000-10.0000 sec 111 MBytes 933 Mbits/sec
[ 1] 0.0000-10.0069 sec 1.09 GBytes 934 Mbits/sec

This is a very significant performance improvement (page pool with
skb_mark_for_recycle). This is very close to the max goodput for a
1Gbit/s link.


For small packet size (64 bytes), all three cases have almost the same result:


To me this indicate, that the DMA map/unmap operations on this platform
are indeed more expensive on larger packets. Given this is what
page_pool does, keeping the DMA mapping intact when recycling.

Driver still need DMA-sync, although I notice you set page_pool feature
flag PP_FLAG_DMA_SYNC_DEV, this is good as page_pool will try to reduce
sync size where possible. E.g. in this SKB case will reduce the DMA-sync
to the max_len=FEC_ENET_RX_FRSIZE which should also help on performance.


shenwei@5810:~$ iperf -c 10.81.16.245 -w 2m -i 1 -l 64
------------------------------------------------------------
Client connecting to 10.81.16.245, TCP port 5001
TCP window size: 416 KByte (WARNING: requested 1.91 MByte)
------------------------------------------------------------
[ 1] local 10.81.17.20 port 58204 connected with 10.81.16.245 port 5001
[ ID] Interval Transfer Bandwidth
[ 1] 0.0000-1.0000 sec 36.9 MBytes 309 Mbits/sec
[ 1] 1.0000-2.0000 sec 36.6 MBytes 307 Mbits/sec
[ 1] 2.0000-3.0000 sec 36.6 MBytes 307 Mbits/sec
[ 1] 3.0000-4.0000 sec 36.5 MBytes 307 Mbits/sec
[ 1] 4.0000-5.0000 sec 37.1 MBytes 311 Mbits/sec
[ 1] 5.0000-6.0000 sec 37.2 MBytes 312 Mbits/sec
[ 1] 6.0000-7.0000 sec 37.1 MBytes 311 Mbits/sec
[ 1] 7.0000-8.0000 sec 37.1 MBytes 311 Mbits/sec
[ 1] 8.0000-9.0000 sec 37.1 MBytes 312 Mbits/sec
[ 1] 9.0000-10.0000 sec 37.2 MBytes 312 Mbits/sec
[ 1] 0.0000-10.0097 sec 369 MBytes 310 Mbits/sec

Regards,
Shenwei


By small packets, do you mean those under the copybreak limit?

Please provide some benchmark numbers with your next patchset.

Yes, the packet size is 64 bytes and it is under the copybreak limit.
As the impact is not significant, I would prefer to remove the
copybreak logic.

+1 to removing this logic if possible, due to maintenance cost.

--Jesper