Re: [PATCH net-next] stmmac: align RX buffers

From: Matteo Croce
Date: Mon Jun 14 2021 - 19:21:17 EST


On Mon, 14 Jun 2021 12:51:11 -0700 (PDT)
David Miller <davem@xxxxxxxxxxxxx> wrote:

>
> But thois means the ethernet header will be misaliugned and this will
> kill performance on some cpus as misaligned accessed are resolved
> wioth a trap handler.
>
> Even on cpus that don't trap, the access will be slower.
>
> Thanks.

Isn't the IP header which should be aligned to avoid expensive traps?
>From include/linux/skbuff.h:

 * Since an ethernet header is 14 bytes network drivers often end up with
 * the IP header at an unaligned offset. The IP header can be aligned by
 * shifting the start of the packet by 2 bytes. Drivers should do this
 * with:
 *
 * skb_reserve(skb, NET_IP_ALIGN);

But the problem here really is not the header alignment, the problem is
that the rx buffer is copied into an skb, and the two buffers have
different alignments.
If I add this print, I get this for every packet:

--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -5460,6 +5460,8 @@ static int stmmac_rx(struct stmmac_priv *priv, int limit, u32 queue)
+ printk("skb->data alignment: %lu\n", (uintptr_t)skb->data & 7);
+ printk("xdp.data alignment: %lu\n" , (uintptr_t)xdp.data & 7);
skb_copy_to_linear_data(skb, xdp.data, buf1_len);

[ 1060.967768] skb->data alignment: 2
[ 1060.971174] xdp.data alignment: 0
[ 1061.967589] skb->data alignment: 2
[ 1061.970994] xdp.data alignment: 0

And many architectures do an optimized memcpy when the low order bits of the
two pointers match, to name a few:

arch/alpha/lib/memcpy.c:
/* If both source and dest are word aligned copy words */
if (!((unsigned int)dest_w & 3) && !((unsigned int)src_w & 3)) {

arch/xtensa/lib/memcopy.S:
/*
* Destination and source are word-aligned, use word copy.
*/
# copy 16 bytes per iteration for word-aligned dst and word-aligned src

arch/openrisc/lib/memcpy.c:
/* If both source and dest are word aligned copy words */
if (!((unsigned int)dest_w & 3) && !((unsigned int)src_w & 3)) {

And so on. With my patch I (mis)align the two buffer at an offset 2
(NET_IP_ALIGN) so the data can be copied faster:

[ 16.648485] skb->data alignment: 2
[ 16.651894] xdp.data alignment: 2
[ 16.714260] skb->data alignment: 2
[ 16.717688] xdp.data alignment: 2

Does this make sense?

Regards,
--
per aspera ad upstream