[BUG] virtio-fs: Corruption when running binaries from virtiofsd-backed fs

From: Erik Schilling
Date: Wed Aug 30 2023 - 14:47:59 EST


Hi all!

Some days ago I posted to #virtiofs:matrix.org, describing that I am
observing what looks like a corruption when executing programs from a
virtiofs-based filesystem.

Over the last few days I spent more time drilling into the problem.
This is an attempt at summarizing my findings in order to see what other
people think about this.

When running binaries mounted from virtiofs they may either: fail with a
segfault, fail with badaddr, get stuck or - sometimes - succeed.

Environment:
Host: Fedora 38 running 6.4.11-200.fc38.x86_64
Guest: Yocto-based image: 6.4.9-yocto-standard, aarch64
virtiofsd: latest main + some debug prints [1]
QEMU: built from recent git [2]

virtiofsd invocation:
RUST_LOG="debug" ./virtiofsd --seccomp=none --sandbox=none \
--socket-path "fs.sock0" --shared-dir $PWD/share-dir/ --cache=never

QEMU invocation:
~/projects/qemu/build/qemu-system-aarch64 -kernel Image -machine virt \
-cpu cortex-a57 \
-serial mon:stdio \
-device virtio-net-pci,netdev=net0 \
-netdev user,id=net0,hostfwd=tcp::2223-:22 \
-display none -m 2048 -smp 4 \
-object memory-backend-memfd,id=mem,size=2048M,share=on \
-numa node,memdev=mem \
-hda trs-overlay-guest.qcow2 \
-chardev socket,id=char0,path="fs.sock0" \
-device vhost-user-fs-pci,queue-size=1024,chardev=char0,tag=/dev/root \
-append 'root=/dev/vda2 ro log_buf_len=8M'

I figured that launching virtiofsd with --cache=always masks the
problem. Therefore, I set --cache=never, but I think I observed no
difference compared to the default setting (auto).

Adding logging to virtiofsd and kernel _feeled_ like it made the problem
harder to reproduce - leaving me with the impression that some race is
happening on somewhere.

Trying to rule out that virtiofsd is returning corrupted data, I added
some logging and hashsum calculation hacks to it [1]. The hashes check
out across multiple accesses and the order and kind of queued messages
is exactly the same in both the error case and crash case. fio was also
unable to find any errors with a naive job description [3].

Next, I tried to capture info on the guest side. This became a bit
tricky since the crashes became pretty rare once I followed a fixed
pattern of starting log capture, running perf and trying to reproduce
the problem. Ultimately, I had the most consistent results with
immediately running a program twice:

/mnt/ld-linux-aarch64.so.1 /mnt/ls.coreutils /; \
/mnt/ld-linux-aarch64.so.1 /mnt/ls.coreutils /

(/mnt being the virtiofs mount)

For collecting logs, I made a hack to the guest kernel in order to dump
the page content after receiving the virtiofs responses [4]. Reproducing
the problem with this, leaves me with logs that seem to suggest that
virtiofsd is returning identical content, but the guest kernel seems to
receive differing pages:

good-kernel [5]:
kernel: virtio_fs_wake_pending_and_unlock: opcode 3 unique 0x312 nodeid 0x1 in.len 56 out.len 104
kernel: virtiofs virtio1: virtio_fs_vq_done requests.0
kernel: virtio_fs_wake_pending_and_unlock: opcode 1 unique 0x314 nodeid 0x1 in.len 53 out.len 128
kernel: virtiofs virtio1: virtio_fs_vq_done requests.0
kernel: virtio_fs_wake_pending_and_unlock: opcode 3 unique 0x316 nodeid 0x29 in.len 56 out.len 104
kernel: virtiofs virtio1: virtio_fs_vq_done requests.0
kernel: virtio_fs_wake_pending_and_unlock: opcode 14 unique 0x318 nodeid 0x29 in.len 48 out.len 16
kernel: virtiofs virtio1: virtio_fs_vq_done requests.0
kernel: virtio_fs_wake_pending_and_unlock: opcode 15 unique 0x31a nodeid 0x29 in.len 80 out.len 832
kernel: virtiofs virtio1: virtio_fs_vq_done requests.0
kernel: virtio_fs: page: 000000006996d520
kernel: virtio_fs: to: 00000000de590c14
kernel: virtio_fs rsp:00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
kernel: virtio_fs rsp:00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
kernel: virtio_fs rsp:00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
kernel: virtio_fs rsp:00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
[...]

bad-kernel [6]:
kernel: virtio_fs_wake_pending_and_unlock: opcode 3 unique 0x162 nodeid 0x1 in.len 56 out.len 104
kernel: virtiofs virtio1: virtio_fs_vq_done requests.0
kernel: virtio_fs_wake_pending_and_unlock: opcode 1 unique 0x164 nodeid 0x1 in.len 53 out.len 128
kernel: virtiofs virtio1: virtio_fs_vq_done requests.0
kernel: virtio_fs_wake_pending_and_unlock: opcode 3 unique 0x166 nodeid 0x16 in.len 56 out.len 104
kernel: virtiofs virtio1: virtio_fs_vq_done requests.0
kernel: virtio_fs_wake_pending_and_unlock: opcode 14 unique 0x168 nodeid 0x16 in.len 48 out.len 16
kernel: virtiofs virtio1: virtio_fs_vq_done requests.0
kernel: virtio_fs_wake_pending_and_unlock: opcode 15 unique 0x16a nodeid 0x16 in.len 80 out.len 832
kernel: virtiofs virtio1: virtio_fs_vq_done requests.0
kernel: virtio_fs: page: 000000006ce9a559
kernel: virtio_fs: to: 000000007ae8b946
kernel: virtio_fs rsp:00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
kernel: virtio_fs rsp:00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
kernel: virtio_fs rsp:80 40 de c8 ff ff 00 00 cc 2b 62 ae ff ff 00 00 .@.......+b.....
kernel: virtio_fs rsp:02 4e de c8 ff ff 00 00 00 00 00 00 00 00 00 00 .N..............
[...]

When looking at the corresponding output from virtiofsd, it claims to
have returned identical data:

good-virtiofsd [7]:
[DEBUG virtiofsd::server] Received request: opcode=Read (15), inode=41, unique=794, pid=481
[src/server.rs:618] r.read_obj().map_err(Error::DecodeMessage)? = ReadIn {
fh: 31,
offset: 0,
size: 832,
read_flags: 2,
lock_owner: 6838554705639967244,
flags: 131072,
padding: 0,
}
[src/file_traits.rs:161] hash = 2308490450751364994
[DEBUG virtiofsd::server] Replying OK, header: OutHeader { len: 848, error: 0, unique: 794 }

bad-virtiofsd [8]:
[DEBUG virtiofsd::server] Received request: opcode=Read (15), inode=22, unique=362, pid=406
[src/server.rs:618] r.read_obj().map_err(Error::DecodeMessage)? = ReadIn {
fh: 12,
offset: 0,
size: 832,
read_flags: 2,
lock_owner: 6181120926258395554,
flags: 131072,
padding: 0,
}
[src/file_traits.rs:161] hash = 2308490450751364994
[DEBUG virtiofsd::server] Replying OK, header: OutHeader { len: 848, error: 0, unique: 362 }

The "corruption" only seems to happen in this one page, all other pages
are identical between runs (except that the bad run terminates earlier).

What do the experts think here? To me it feels a bit like some kind of
corruption is going on. Or am I misinterpreting things here?

Which further analysis steps would you suggest?


Further notes:

After collecting the above results, I realized that running the guest
with -smp 1 makes the problems a lot worse. So maybe that is a better
choice when trying to reproduce it.

Repo with my scripts is available at:
https://git.codelinaro.org/erik_schilling/jira-orko-65-bootstrap-k3s-config/

The scripts are just quick and dirty implementations and are not
particulary portable.

- Erik

[1] https://gitlab.com/ablu/virtiofsd/-/commit/18fd0c1849e15bc55fbdd6e1f169801b2b03da1f
[2] https://gitlab.com/qemu-project/qemu/-/commit/50e7a40af372ee5931c99ef7390f5d3d6fbf6ec4
[3] https://git.codelinaro.org/erik_schilling/jira-orko-65-bootstrap-k3s-config/-/blob/397a6310dea35973025e3d61f46090bf0c092762/share-dir/write-and-verify-mmap.fio
[4] https://github.com/Ablu/linux/commit/3880b9f8affb01aeabb0a04fe76ad7701dc0bb95
[5] Line 12923: https://git.codelinaro.org/erik_schilling/jira-orko-65-bootstrap-k3s-config/-/blob/main/logs/2023-08-29%2013%3A42%3A35%2B02%3A00/good-drop-bad-1.txt
[6] Line 12923: https://git.codelinaro.org/erik_schilling/jira-orko-65-bootstrap-k3s-config/-/blob/main/logs/2023-08-29%2013%3A42%3A35%2B02%3A00/good-bad-1.txt
[7] https://git.codelinaro.org/erik_schilling/jira-orko-65-bootstrap-k3s-config/-/blob/main/logs/2023-08-29%2013%3A42%3A35%2B02%3A00/virtiofsd.txt#L2538-2549
[8] https://git.codelinaro.org/erik_schilling/jira-orko-65-bootstrap-k3s-config/-/blob/main/logs/2023-08-29%2013%3A42%3A35%2B02%3A00/virtiofsd.txt#L1052-1063