Re: [RFC PATCH net 2/3] virtio-net: allow usage of vrings smaller than MAX_SKB_FRAGS + 2

From: Alvaro Karsz
Date: Sun Apr 30 2023 - 14:54:22 EST


> > At the moment, if a network device uses vrings with less than
> > MAX_SKB_FRAGS + 2 entries, the device won't be functional.
> >
> > The following condition vq->num_free >= 2 + MAX_SKB_FRAGS will always
> > evaluate to false, leading to TX timeouts.
> >
> > This patch introduces a new variable, single_pkt_max_descs, that holds
> > the max number of descriptors we may need to handle a single packet.
> >
> > This patch also detects the small vring during probe, blocks some
> > features that can't be used with small vrings, and fails probe,
> > leading to a reset and features re-negotiation.
> >
> > Features that can't be used with small vrings:
> > GRO features (VIRTIO_NET_F_GUEST_*):
> > When we use small vrings, we may not have enough entries in the ring to
> > chain page size buffers and form a 64K buffer.
> > So we may need to allocate 64k of continuous memory, which may be too
> > much when the system is stressed.
> >
> > This patch also fixes the MTU size in small vring cases to be up to the
> > default one, 1500B.
>
> and then it should clear VIRTIO_NET_F_MTU?
>

Following [1], I was thinking to accept the feature and a let the device figure out that it can't transmit a big packet, since the RX buffers are not big enough (without VIRTIO_NET_F_MRG_RXBUF).
But, I think that we may need to block the MTU feature after all.
Quoting the spec:

A driver SHOULD negotiate VIRTIO_NET_F_MTU if the device offers it.
If the driver negotiates VIRTIO_NET_F_MTU, it MUST supply enough receive buffers to receive at least one receive packet of size mtu (plus low level ethernet header length) with gso_type NONE or ECN.

So, if VIRTIO_NET_F_MTU is negotiated, we MUST supply enough receive buffers.
So I think that blocking VIRTIO_NET_F_MTU should be the way to go, If mtu > 1500.

[1] https://lore.kernel.org/lkml/20230417031052-mutt-send-email-mst@xxxxxxxxxx/

> > + /* How many ring descriptors we may need to transmit a single packet */
> > + u16 single_pkt_max_descs;
> > +
> > + /* Do we have virtqueues with small vrings? */
> > + bool svring;
> > +
> > /* CPU hotplug instances for online & dead */
> > struct hlist_node node;
> > struct hlist_node node_dead;
>
> worth checking that all these layout changes don't push useful things to
> a different cache line. can you add that analysis?
>

Good point.
I think that we can just move these to the bottom of the struct.

>
> I see confusiong here wrt whether some rings are "small"? all of them?
> some rx rings? some tx rings? names should make it clear.

The small vring is a device attribute, not a vq attribute. It blocks features, which affects the entire device.
Maybe we can call it "small vring mode".

> also do we really need bool svring? can't we just check single_pkt_max_descs
> all the time?
>

We can work without the bool, we could always check if single_pkt_max_descs != MAX_SKB_FRAGS + 2.
It doesn't really matter to me, I was thinking it may be more readable this way.

> > +static bool virtnet_uses_svring(struct virtnet_info *vi)
> > +{
> > + u32 i;
> > +
> > + /* If a transmit/receive virtqueue is small,
> > + * we cannot handle fragmented packets.
> > + */
> > + for (i = 0; i < vi->max_queue_pairs; i++) {
> > + if (IS_SMALL_VRING(virtqueue_get_vring_size(vi->sq[i].vq)) ||
> > + IS_SMALL_VRING(virtqueue_get_vring_size(vi->rq[i].vq)))
> > + return true;
> > + }
> > +
> > + return false;
> > +}
>
> I see even if only some rings are too small we force everything to use
> small ones. Wouldn't it be better to just disable small ones in this
> case? That would not need a reset.
>

I'm not sure. It may complicate things.

What if all TX vqs are small?
What if all RX vqs are small?
What if we end up with an unbalanced number of TX vqs and RX vqs? is this allowed by the spec?
What if we end up disabling the RX default vq (receiveq1)?

I guess we could do it, after checking some conditions.
Maybe we can do it in a follow up patch?
Do you think it's important for it to be included since day 1?

I think that the question is: what's more important, to use all the vqs while blocking some features, or to use part of the vqs without blocking features?

> > +
> > +/* Function returns the number of features it blocked */
>
> We don't need the # though. Make it bool?
>

Sure.