Re: [PATCH V3 net-next] net: fec: add XDP_TX feature support

From: Jesper Dangaard Brouer
Date: Tue Aug 08 2023 - 13:47:17 EST




On 08/08/2023 07.02, Wei Fang wrote:
For XDP_REDIRECT, the performance show as follow.
root@imx8mpevk:~# ./xdp_redirect eth1 eth0 Redirecting from eth1
(ifindex 3; driver st_gmac) to eth0 (ifindex 2; driver fec)

This is not exactly the same as XDP_TX setup as here you choose to redirect
between eth1 (driver st_gmac) and to eth0 (driver fec).

I would like to see eth0 to eth0 XDP_REDIRECT, so we can compare to
XDP_TX performance.
Sorry for all the requests, but can you provide those numbers?


Oh, sorry, I thought what you wanted were XDP_REDIRECT results for different
NICs. Below is the result of XDP_REDIRECT on the same NIC.
root@imx8mpevk:~# ./xdp_redirect eth0 eth0
Redirecting from eth0 (ifindex 2; driver fec) to eth0 (ifindex 2; driver fec)
Summary 232,302 rx/s 0 err,drop/s 232,344 xmit/s
Summary 234,579 rx/s 0 err,drop/s 234,577 xmit/s
Summary 235,548 rx/s 0 err,drop/s 235,549 xmit/s
Summary 234,704 rx/s 0 err,drop/s 234,703 xmit/s
Summary 235,504 rx/s 0 err,drop/s 235,504 xmit/s
Summary 235,223 rx/s 0 err,drop/s 235,224 xmit/s
Summary 234,509 rx/s 0 err,drop/s 234,507 xmit/s
Summary 235,481 rx/s 0 err,drop/s 235,482 xmit/s
Summary 234,684 rx/s 0 err,drop/s 234,683 xmit/s
Summary 235,520 rx/s 0 err,drop/s 235,520 xmit/s
Summary 235,461 rx/s 0 err,drop/s 235,461 xmit/s
Summary 234,627 rx/s 0 err,drop/s 234,627 xmit/s
Summary 235,611 rx/s 0 err,drop/s 235,611 xmit/s
Packets received : 3,053,753
Average packets/s : 234,904
Packets transmitted : 3,053,792
Average transmit/s : 234,907

I'm puzzled that moving the MMIO write isn't change performance.

Can you please verify that the packet generator machine is sending more
frame than the system can handle?

(meaning the pktgen_sample03_burst_single_flow.sh script fast enough?)


Thanks very much!
You remind me, I always started the pktgen script first and then ran the xdp2
program in the previous tests. So I saw the transmit speed of the generator
was always greater than the speed of XDP_TX when I stopped the script. But
actually, the real-time transmit speed of the generator was degraded to as
equal to the speed of XDP_TX.


Good that we finally found the root-cause, that explains why it seems
our code changes didn't have any effect. The generator gets affected
and slowed down due to the traffic that is bounced back to it. (I tried
to hint this earlier with the Ethernet Flow-Control settings).

So I turned off the rx function of the generator in case of increasing the CPU
loading of the generator due to the returned traffic from xdp2.

How did you turned off the rx function of the generator?
(I a couple of tricks I use)

And I tested
the performance again. Below are the results.

Result 1: current method
root@imx8mpevk:~# ./xdp2 eth0
proto 17: 326539 pkt/s
proto 17: 326464 pkt/s
proto 17: 326528 pkt/s
proto 17: 326465 pkt/s
proto 17: 326550 pkt/s

Result 2: sync_dma_len method
root@imx8mpevk:~# ./xdp2 eth0
proto 17: 353918 pkt/s
proto 17: 352923 pkt/s
proto 17: 353900 pkt/s
proto 17: 352672 pkt/s
proto 17: 353912 pkt/s


This looks more promising:
((353912/326550)-1)*100 = 8.37% faster.

Or gaining/saving approx 236 nanosec per packet ((1/326550-1/353912)*10^9).

Note: the speed of the generator is about 935397pps.

Compared result 1 with result 2. The "sync_dma_len" method actually improves
the performance of XDP_TX, so the conclusion from the previous tests is *incorrect*.
I'm so sorry for that. :(


I'm happy that we finally found the root-cause.
Thanks for doing all the requested tests I asked for.

In addition, I also tried the "dma_sync_len" + not use xdp_convert_buff_to_frame()
method, the performance has been further improved. Below is the result.

Result 3: sync_dma_len + not use xdp_convert_buff_to_frame() method
root@imx8mpevk:~# ./xdp2 eth0
proto 17: 369261 pkt/s
proto 17: 369267 pkt/s
proto 17: 369206 pkt/s
proto 17: 369214 pkt/s
proto 17: 369126 pkt/s

Therefore, I'm intend to use the "dma_sync_len"+ not use xdp_convert_buff_to_frame()
method in the V5 patch. Thank you again, Jesper and Jakub. You really helped me a lot. :)


I suggest, that V5 patch still use xdp_convert_buff_to_frame(), and then
you send followup patch (or as 2/2 patch) that remove the use of
xdp_convert_buff_to_frame() for XDP_TX. This way it is easier to keep
track of the changes and improvements.

I would be very interested in knowing if the MMIO test change after this
correction to the testlab/generator.

--Jesper