RE: [PATCH v8] net/packet: support mergeable feature of virtio

From: Willem de Bruijn
Date: Thu Apr 13 2023 - 17:59:08 EST


沈安琪(凛玥) wrote:
> From: Jianfeng Tan <henry.tjf@xxxxxxxxxxxx>
>
> Packet sockets, like tap, can be used as the backend for kernel vhost.
> In packet sockets, virtio net header size is currently hardcoded to be
> the size of struct virtio_net_hdr, which is 10 bytes; however, it is not
> always the case: some virtio features, such as mrg_rxbuf, need virtio
> net header to be 12-byte long.
>
> Mergeable buffers, as a virtio feature, is worthy of supporting: packets
> that are larger than one-mbuf size will be dropped in vhost worker's
> handle_rx if mrg_rxbuf feature is not used, but large packets
> cannot be avoided and increasing mbuf's size is not economical.
>
> With this virtio feature enabled by virtio-user, packet sockets with
> hardcoded 10-byte virtio net header will parse mac head incorrectly in
> packet_snd by taking the last two bytes of virtio net header as part of
> mac header.
> This incorrect mac header parsing will cause packet to be dropped due to
> invalid ether head checking in later under-layer device packet receiving.
>
> By adding extra field vnet_hdr_sz with utilizing holes in struct
> packet_sock to record currently used virtio net header size and supporting
> extra sockopt PACKET_VNET_HDR_SZ to set specified vnet_hdr_sz, packet
> sockets can know the exact length of virtio net header that virtio user
> gives.
> In packet_snd, tpacket_snd and packet_recvmsg, instead of using
> hardcoded virtio net header size, it can get the exact vnet_hdr_sz from
> corresponding packet_sock, and parse mac header correctly based on this
> information to avoid the packets being mistakenly dropped.
>
> Signed-off-by: Jianfeng Tan <henry.tjf@xxxxxxxxxxxx>
> Co-developed-by: Anqi Shen <amy.saq@xxxxxxxxxxxx>
> Signed-off-by: Anqi Shen <amy.saq@xxxxxxxxxxxx>
> ---
>
> Changelog
>
> V7 -> V8:
> * remove redundant variables;
> * resolve KCSAN warning.
>
> V6 -> V7:
> * addresses coding style comments.
>
> V5 -> V6:
> * rebase patch based on 6.3-rc2.
>
> V4 -> V5:
> * add READ_ONCE() macro when initializing local vnet_hdr_sz variable;
> * fix some nits.
>
> V3 -> V4:
> * read po->vnet_hdr_sz once during vnet_hdr_sz and use vnet_hdr_sz locally
> to avoid race condition;
> * modify how to check non-zero po->vnet_hdr_sz;
> * separate vnet_hdr_sz as a u8 field in struct packet_sock instead of 8-bit
> in an int field.
>
> V2 -> V3:
> * remove has_vnet_hdr field and use vnet_hdr_sz to indicate whether
> there is a vnet header;
> * refactor PACKET_VNET_HDR and PACKET_VNET_HDR_SZ sockopt to remove
> redundant code.
>
> V1 -> V2:
> * refactor the implementation of PACKET_VNET_HDR and PACKET_VNET_HDR_SZ
> socketopts to get rid of redundate code;
> * amend packet_rcv_vnet in af_packet.c to avoid extra function invocation.
>
> include/uapi/linux/if_packet.h | 1 +
> net/packet/af_packet.c | 93 ++++++++++++++++++++--------------
> net/packet/diag.c | 2 +-
> net/packet/internal.h | 2 +-
> 4 files changed, 58 insertions(+), 40 deletions(-)
>
> @@ -2250,7 +2250,7 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev,
> __u32 ts_status;
> bool is_drop_n_account = false;
> unsigned int slot_id = 0;
> - bool do_vnet = false;
> + int vnet_hdr_sz = 0;
>
> /* struct tpacket{2,3}_hdr is aligned to a multiple of TPACKET_ALIGNMENT.
> * We may add members to them until current aligned size without forcing
> @@ -2308,10 +2308,9 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev,
> netoff = TPACKET_ALIGN(po->tp_hdrlen +
> (maclen < 16 ? 16 : maclen)) +
> po->tp_reserve;
> - if (packet_sock_flag(po, PACKET_SOCK_HAS_VNET_HDR)) {
> - netoff += sizeof(struct virtio_net_hdr);
> - do_vnet = true;
> - }
> + vnet_hdr_sz = READ_ONCE(po->vnet_hdr_sz);
> + if (vnet_hdr_sz)
> + netoff += vnet_hdr_sz;
> macoff = netoff - maclen;
> }
> if (netoff > USHRT_MAX) {
> @@ -2337,7 +2336,6 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev,
> snaplen = po->rx_ring.frame_size - macoff;
> if ((int)snaplen < 0) {
> snaplen = 0;
> - do_vnet = false;
> }
> }
> } else if (unlikely(macoff + snaplen >
> @@ -2351,7 +2349,6 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev,
> if (unlikely((int)snaplen < 0)) {
> snaplen = 0;
> macoff = GET_PBDQC_FROM_RB(&po->rx_ring)->max_frame_len;
> - do_vnet = false;

here and in the block above the existing behavior must be maintained:
vnet_hdr_sz must be reset to zero in these cases.

> }
> }
> spin_lock(&sk->sk_receive_queue.lock);
> @@ -2367,7 +2364,7 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev,
> __set_bit(slot_id, po->rx_ring.rx_owner_map);
> }
>
> - if (do_vnet &&
> + if (vnet_hdr_sz &&
> virtio_net_hdr_from_skb(skb, h.raw + macoff -
> sizeof(struct virtio_net_hdr),
> vio_le(), true, 0)) {