Re: Commit 'sunrpc: Use sendmsg(MSG_SPLICE_PAGES) rather then sendpage' broke O_DIRECT over NFS

From: Maxim Levitsky
Date: Thu Aug 17 2023 - 16:51:22 EST


У чт, 2023-08-17 у 15:58 +0000, Chuck Lever III пише:
> > On Aug 17, 2023, at 11:57 AM, Chuck Lever <chuck.lever@xxxxxxxxxx> wrote:
> >
> >
> >
> > > On Aug 17, 2023, at 11:52 AM, Maxim Levitsky <mlevitsk@xxxxxxxxxx> wrote:
> > >
> > > Hi!
> > >
> > > I just updated my developement systems to 6.5-rc6 (from 6.4) and now I can't start a VM
> > > with a disk which is mounted over the NFS.
> > >
> > > The VM has two qcow2 files, one depends on another and qemu opens both.
> > >
> > > This is the command line of qemu:
> > >
> > > -drive if=none,id=os_image,file=./disk_s1.qcow2,aio=native,discard=unmap,cache=none
> > >
> > > The disk_s1.qcow2 depends on disk_s0.qcow2
> > >
> > > However this is what I get:
> > >
> > > qemu-system-x86_64: -drive if=none,id=os_image,file=./disk_s1.qcow2,aio=native,discard=unmap,cache=none: Could not open backing file: Could not open './QFI?': No such file or directory
> > >
> > > 'QFI?' is qcow2 file signature, which signals that there might be some nasty corruption happening.
> > >
> > > The program was supposed to read a field inside the disk_s1.qcow2 file which should read 'disk_s0.qcow2'
> > > but instead it seems to read the first 4 bytes of the file.
> > >
> > >
> > > Bisect leads to the above commit. Reverting it was not possible due to many changes.
> > >
> > > Both the client and the server were tested with the 6.5-rc6 kernel, but once rebooting the server into
> > > the 6.4, the bug disappeared, thus I did a bisect on the server.
> > >
> > > When I tested a version before the offending commit on the server, the 6.5-rc6 client was able to work with it,
> > > which increases the chances that the bug is in nfsd.
> > >
> > > Switching qemu to use write back paging also helps (aio=threads,discard=unmap,cache=writeback)
> > > The client and the server (both 6.5-rc6) work with this configuration.
> > >
> > > Running the VM on the same machine (also 6.5-rc6) where the VM disk is located (thus avoiding NFS) works as well.
> > >
> > > I tested several VMs that I have, all are affected in the same way.
> > >
> > > I run somewhat outdated qemu, but running the latest qemu doesn't make a difference.
> > >
> > > I use nfs4.
> > >
> > > I can test patches and provide more info if needed.
> >
> > Linus just merged a possible fix for this issue. See:
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/ master
>
> In particular:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c96e2a695e00bca5487824d84b85aab6aa2c1891

I just tested it. It does help (qemu doesn't crash anymore) but it doesn't eliminate the issue (VM still doesn't boot)

The VM now starts but it drops into the UEFI shell.

Once again, disabling O_DIRECT helps (that is -aio=threads,cache=writeback)

For the reference, few kernels ago, I had an unrelated bug (not even NFS related, it was happening locally as well),
which caused the exact same drop to the UEFI shell when using O_DIRECT:

https://www.mail-archive.com/qemu-devel@xxxxxxxxxx/msg912549.html

It was decided that this issue is a qemu issue because it relied on undefined kernel behavior which has changed,
so the qemu got patched to fix the issue on its side.

Since sometimes I use an older qemu version, I had this kernel commit reverted for now, but to be sure I now had built a kernel
without the revert on both server and the client, and tested with the latest qemu which has the fix for the bug.

I don't remember details of this unrelated bug, but if I remember correctly, qemu had trouble reading first 512 bytes of the virtual disk, when
the VM tried to do so to read the boot sector.


Best regards,
Maxim Levitsky

>
>
> --
> Chuck Lever
>
>