Re: Packet gets stuck in NOLOCK pfifo_fast qdisc

From: Juergen Gross
Date: Tue Apr 06 2021 - 06:13:30 EST


On 06.04.21 09:06, Michal Kubecek wrote:
On Tue, Apr 06, 2021 at 08:55:41AM +0800, Yunsheng Lin wrote:

Hi, Jiri
Do you have a reproducer that can be shared here?
With reproducer, I can debug and test it myself too.

I'm afraid we are not aware of a simple reproducer. As mentioned in the
original discussion, the race window is extremely small and the other
thread has to do quite a lot in the meantime which is probably why, as
far as I know, this was never observed on real hardware, only in
virtualization environments. NFS may also be important as, IIUC, it can
often issue an RPC request from a different CPU right after a data
transfer. Perhaps you could cheat a bit and insert a random delay
between the empty queue check and releasing q->seqlock to make it more
likely to happen.

Other than that, it's rather just "run this complex software in a xen VM
and wait".

Being the one who has managed to reproduce the issue I can share my
setup, maybe you can setup something similar (we have seen the issue
with this kind of setup on two different machines).

I'm using a physical machine with 72 cpus and 48 GB of memory. It is
running Xen as virtualization platform.

Xen dom0 is limited to 40 vcpus and 32 GB of memory, the dom0 vcpus are
limited to run on the first 40 physical cpus (no idea whether that
matters, though).

In a guest with 16 vcpu and 8GB of memory I'm running 8 parallel
sysbench instances in a loop, those instances are prepared via

sysbench --file-test-mode=rndrd --test=fileio prepare

and then started in a do while loop via:

sysbench --test=fileio --file-test-mode=rndrw --rand-seed=0 --max-time=300 --max-requests=0 run

Each instance is using a dedicated NFS mount to run on. The NFS
server for the 8 mounts is running in dom0 of the same server, the
data of the NFS shares is located in a RAM disk (size is a little bit
above 16GB). The shares are mounted in the guest with:

mount -t nfs -o rw,proto=tcp,nolock,nfsvers=3,rsize=65536,wsize=65536,nosharetransport dom0:/ramdisk/share[1-8] /mnt[1-8]

The guests vcpus are limited to run on physical cpus 40-55, on the same
physical cpus I have 16 small guests running eating up cpu time, each of
those guests is pinned to one of the physical cpus 40-55.

That's basically it. All you need to do is to watch out for sysbench
reporting maximum latencies above one second or so (in my setup there
are latencies of several minutes at least once each hour of testing).

In case you'd like to have some more details about the setup don't
hesitate to contact me directly. I can provide you with some scripts
and config runes if you want.


Juergen

Attachment: OpenPGP_0xB0DE9DD628BF132F.asc
Description: application/pgp-keys

Attachment: OpenPGP_signature
Description: OpenPGP digital signature