Re: [PATCH net-next 0/2] xen-netback: Changes around carrier handling

From: Zoltan Kiss
Date: Wed Aug 06 2014 - 15:21:24 EST


On 06/08/14 00:07, David Miller wrote:
From: Zoltan Kiss <zoltan.kiss@xxxxxxxxxx>
Date: Mon, 4 Aug 2014 16:20:56 +0100

This series starts using carrier off as a way to purge packets when the guest is
not able (or willing) to receive them. It is a much faster way to get rid of
packets waiting for an overwhelmed guest.
The first patch changes current netback code where it relies currently on
netif_carrier_ok.
The second turns off the carrier if the guest times out on a queue, and only
turn it on again if that queue (or queues) resurrects.

Signed-off-by: Zoltan Kiss <zoltan.kiss@xxxxxxxxxx>
Signed-off-by: David Vrabel <david.vrabel@xxxxxxxxxx>

Applied, but I have some reservations:

1) This is starting to bleed what is normally qdisc type policy into the
driver.
Yeah. The fundamental problem with netback that start_xmit place the packet into an internal queue, and then the thread does the actual transmission from that queue, but it doesn't know whether it will succeed in a finite time period.
There is slot estimation in start_xmit to prevent QDisc pouring more packets into the internal queue when it seems it won't fit. When that happens, we stop QDisc and start a timer, and when the timer fires, before this patch we just ditched the internal queue and started QDisc again. If the frontend were dead, it lead to a quite long delay until packets destined to it were freed.
I just noticed a possible problem with start_xmit: if this slot estimation fails, and the packet is likely to be stalled at least for a while, it still place the skb into the internal queue and return NETDEV_TX_OK. Shouldn't we return NETDEV_TX_BUSY and not placing the packet into the internal queue? It will be requeued later, or dropped by QDisc. I think it will be more natural. But it can decrease performance if these "not enough slot" situations are very frequent and short living, and by the time the RX thread wakes up
My long term idea is to move part of the thread's work into start_xmit, so it can set up the grant operations and if there isn't enough slot it can return the skb to QDisc with NETDEV_TX_BUSY for requeueing. Then the thread can only do the batching of the grant copy operations and releasing the skbs. And we can ditch a good part of the code ...


2) There are other drivers that could run into this kind of situation and
have similar concerns, therefore we should make sure we have a consistent
approach that such entities use to handle this problem.

Part of the problem is that netif_carrier_off() only partially mimicks
the situation. It expresses the "transmitter is down so packets
aren't going onto the wire" part, which keeps the watchdog from
spitting out log messages ever time it fires. But it doesn't deal
with packet freeing policy meanwhile, which I guess is the part that
this patch series is largely trying to address.

Thanks.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/