Re: [PATCH v9 04/17] iov_iter: Turn iov_iter_fault_in_readable into fault_in_iov_iter_readable

From: Andreas Gruenbacher
Date: Thu Nov 04 2021 - 16:31:25 EST


On Thu, Nov 4, 2021 at 7:22 PM Catalin Marinas <catalin.marinas@xxxxxxx> wrote:
> On Tue, Nov 02, 2021 at 01:29:32PM +0100, Andreas Gruenbacher wrote:
> > Turn iov_iter_fault_in_readable into a function that returns the number
> > of bytes not faulted in, similar to copy_to_user, instead of returning a
> > non-zero value when any of the requested pages couldn't be faulted in.
> > This supports the existing users that require all pages to be faulted in
> > as well as new users that are happy if any pages can be faulted in.
> >
> > Rename iov_iter_fault_in_readable to fault_in_iov_iter_readable to make
> > sure this change doesn't silently break things.
> >
> > Signed-off-by: Andreas Gruenbacher <agruenba@xxxxxxxxxx>
> [...]
> > diff --git a/mm/filemap.c b/mm/filemap.c
> > index ff34f4087f87..4dd5edcd39fd 100644
> > --- a/mm/filemap.c
> > +++ b/mm/filemap.c
> > @@ -3757,7 +3757,7 @@ ssize_t generic_perform_write(struct file *file,
> > * same page as we're writing to, without it being marked
> > * up-to-date.
> > */
> > - if (unlikely(iov_iter_fault_in_readable(i, bytes))) {
> > + if (unlikely(fault_in_iov_iter_readable(i, bytes))) {
> > status = -EFAULT;
> > break;
> > }
>
> Now that fault_in_iov_iter_readable() returns the number of bytes, we
> could change the above test to:
>
> if (unlikely(fault_in_iov_iter_readable(i, bytes) == bytes)) {
>
> Assuming we have a pointer 'a', accessible, and 'a + PAGE_SIZE' unmapped:
>
> write(fd, a + PAGE_SIZE - 1, 2);
>
> can still copy one byte but it returns -EFAULT instead since the second
> page is not accessible.
>
> While writing some test-cases for MTE (sub-page faults, 16-byte
> granularity), we noticed that reading 2 bytes from 'a + 15' with
> 'a + 16' tagged for faulting:
>
> write(fd, a + 15, 2);
>
> succeeds as long as 'a + 16' is not at a page boundary. Checking against
> 'bytes' above makes this consistent.
>
> The downside is that it's an ABI change though not sure anyone is
> relying on it.

The same pattern exists in iomap_write_iter too, of course. In the
very light testing I did for eliminating the pre-faulting, this kind
of change was working fine. I have no performance numbers though.

https://lore.kernel.org/linux-fsdevel/20211026094430.3669156-1-agruenba@xxxxxxxxxx/
https://lore.kernel.org/linux-fsdevel/20211027212138.3722977-1-agruenba@xxxxxxxxxx/

Thanks,
Andreas