Re: [PATCH v10] vfs: fix copy_file_range regression in cross-fs copies

From: Amir Goldstein
Date: Wed Jun 30 2021 - 11:38:30 EST


On Wed, Jun 30, 2021 at 6:06 PM Luis Henriques <lhenriques@xxxxxxx> wrote:
>
> On Wed, Jun 30, 2021 at 05:56:34PM +0300, Amir Goldstein wrote:
> > On Wed, Jun 30, 2021 at 4:44 PM Luis Henriques <lhenriques@xxxxxxx> wrote:
> > >
> > > A regression has been reported by Nicolas Boichat, found while using the
> > > copy_file_range syscall to copy a tracefs file. Before commit
> > > 5dae222a5ff0 ("vfs: allow copy_file_range to copy across devices") the
> > > kernel would return -EXDEV to userspace when trying to copy a file across
> > > different filesystems. After this commit, the syscall doesn't fail anymore
> > > and instead returns zero (zero bytes copied), as this file's content is
> > > generated on-the-fly and thus reports a size of zero.
> > >
> > > This patch restores some cross-filesystem copy restrictions that existed
> > > prior to commit 5dae222a5ff0 ("vfs: allow copy_file_range to copy across
> > > devices"). Filesystems are still allowed to fall-back to the VFS
> > > generic_copy_file_range() implementation, but that has now to be done
> > > explicitly.
> > >
> > > nfsd is also modified to fall-back into generic_copy_file_range() in case
> > > vfs_copy_file_range() fails with -EOPNOTSUPP or -EXDEV.
> > >
> > > Fixes: 5dae222a5ff0 ("vfs: allow copy_file_range to copy across devices")
> > > Link: https://lore.kernel.org/linux-fsdevel/20210212044405.4120619-1-drinkcat@xxxxxxxxxxxx/
> > > Link: https://lore.kernel.org/linux-fsdevel/CANMq1KDZuxir2LM5jOTm0xx+BnvW=ZmpsG47CyHFJwnw7zSX6Q@xxxxxxxxxxxxxx/
> > > Link: https://lore.kernel.org/linux-fsdevel/20210126135012.1.If45b7cdc3ff707bc1efa17f5366057d60603c45f@changeid/
> > > Reported-by: Nicolas Boichat <drinkcat@xxxxxxxxxxxx>
> > > Reported-by: kernel test robot <oliver.sang@xxxxxxxxx>
> > > Signed-off-by: Luis Henriques <lhenriques@xxxxxxx>
> > > ---
> > > Changes since v9
> > > - the early return from the syscall when len is zero now checks if the
> > > filesystem is implemented, returning -EOPNOTSUPP if it is not and 0
> > > otherwise. Issue reported by test robot.
> >
> > What issue was reported?
>
> Here's the link to my previous email:
>
> https://lore.kernel.org/linux-fsdevel/877dk1zibo.fsf@xxxxxxx/
>

Sorry, I missed it. I guess the subject was not aluring enough ;-)

So your patch does not fix the root cause.
The solution is to remove the (len == 0) short-circuit as you first suggested.

The problem is this:
A program tries to check for CFR support by calling CFR with zero length.
The XFS filesystem driver (in the test robot report) supports CFR via the
remap_file_range() method in general, but not on the particular filesystem
instance that was formatted without reflink support.
The intention of the program was to test for CFR support on the particular
filesystem instance, so the short-circuit response is wrong.

Note that vfs_clone_file_range() does NOT short circuit (len == 0).
That is (allegedly) because it needs to call into the filesystem
method to know if the filesystem instance supports clone_file_range.

The reason that your patch is wrong is because the same situation
can happen with a filesystem driver that has a copy_file_range()
method, but a particular instance does not support copy_file_range().
For example, overlayfs has an ovl_copy_file_range() method, so it would
short circuit zero CFR, but if in a particular overlayfs, the upper fs does
not support CFR, then the overlayfs instance does not support CFR either.

> ... which reminds me that I need to also send a patch to fix the fstest.
> (Although the test as-is actually allowed to find this bug...)
>

Not sure why you'd want to fix the test.
The test check with a zero length file seems valid to me.

Thanks,
Amir.