Re: Packet gets stuck in NOLOCK pfifo_fast qdisc

From: Josh Hunt
Date: Fri Apr 02 2021 - 15:33:59 EST


On 4/2/21 12:25 PM, Jiri Kosina wrote:
On Thu, 3 Sep 2020, John Fastabend wrote:

At this point I fear we could consider reverting the NOLOCK stuff.
I personally would hate doing so, but it looks like NOLOCK benefits are
outweighed by its issues.

I agree, NOLOCK brings more pains than gains. There are many race
conditions hidden in generic qdisc layer, another one is enqueue vs.
reset which is being discussed in another thread.

Sure. Seems they crept in over time. I had some plans to write a
lockless HTB implementation. But with fq+EDT with BPF it seems that
it is no longer needed, we have a more generic/better solution. So
I dropped it. Also most folks should really be using fq, fq_codel,
etc. by default anyways. Using pfifo_fast alone is not ideal IMO.

Half a year later, we still have the NOLOCK implementation
present, and pfifo_fast still does set the TCQ_F_NOLOCK flag on itself.

And we've just been bitten by this very same race which appears to be
still unfixed, with single packet being stuck in pfifo_fast qdisc
basically indefinitely due to this very race that this whole thread began
with back in 2019.

Unless there are

(a) any nice ideas how to solve this in an elegant way without
(re-)introducing extra spinlock (Cong's fix) or

(b) any objections to revert as per the argumentation above

I'll be happy to send a revert of the whole NOLOCK implementation next
week.


Jiri

If you have a reproducer can you try https://lkml.org/lkml/2021/3/24/1485 ? If that doesn't work I think your suggestion of reverting nolock makes sense to me. We've moved to using fq as our default now b/c of this bug.

Josh