Re: [PATCH 5/7] io_uring: rsrc: use FOLL_SAME_FILE on pin_user_pages()

From: Lorenzo Stoakes
Date: Mon Apr 17 2023 - 11:21:56 EST


On Mon, Apr 17, 2023 at 11:15:10AM -0300, Jason Gunthorpe wrote:
> On Mon, Apr 17, 2023 at 03:00:16PM +0100, Lorenzo Stoakes wrote:
> > On Mon, Apr 17, 2023 at 10:26:09AM -0300, Jason Gunthorpe wrote:
> > > On Mon, Apr 17, 2023 at 02:19:16PM +0100, Lorenzo Stoakes wrote:
> > >
> > > > > I'd rather see something like FOLL_ALLOW_BROKEN_FILE_MAPPINGS than
> > > > > io_uring open coding this kind of stuff.
> > > > >
> > > >
> > > > How would the semantics of this work? What is broken? It is a little
> > > > frustrating that we have FOLL_ANON but hugetlb as an outlying case, adding
> > > > FOLL_ANON_OR_HUGETLB was another consideration...
> > >
> > > It says "historically this user has accepted file backed pages and we
> > > we think there may actually be users doing that, so don't break the
> > > uABI"
> >
> > Having written a bunch here I suddenly realised that you probably mean for
> > this flag to NOT be applied to the io_uring code and thus have it enforce
> > the 'anonymous or hugetlb' check by default?
>
> Yes
>
> > So you mean to disallow file-backed page pinning as a whole unless this
> > flag is specified?
>
> Yes
>
> > For FOLL_GET I can see that access to the underlying
> > data is dangerous as the memory may get reclaimed or migrated, but surely
> > DMA-pinned memory (as is the case here) is safe?
>
> No, it is all broken, read-only access is safe.
>
> We are trying to get a point where pin access will interact properly
> with the filesystem, but it isn't done yet.
>
> > Or is this a product more so of some kernel process accessing file-backed
> > pages for a file system which expects write-notify semantics and doesn't
> > get them in this case, which could indeed be horribly broken.
>
> Yes, broadly
>
> > I am definitely in favour of cutting things down if possible, and very much
> > prefer the use of uaccess if we are able to do so rather than GUP.
> >
> > I do feel that GUP should be focused purely on pinning memory rather than
> > manipulating it (whether read or write) so I agree with this sentiment.
>
> Yes, someone needs to be brave enough to go and try to adjust these
> old places :)

Well, I liek to think of myself as stupid^W brave enough to do such things
so may try a separate patch series on that :)

>
> I see in the git history this was added to solve CVE-2018-1120 - eg
> FUSE can hold off fault-in indefinitely. So the flag is really badly
> misnamed - it is "FOLL_DONT_BLOCK_ON_USERSPACE" and anon memory is a
> simple, but overly narrow, way to get that property.
>
> If it is changed to use kthread_use_mm() it needs a VMA based check
> for the same idea.
>
> Jason

I'll try my hand at patching this also!

As for FOLL_ALLOW_BROKEN_FILE_MAPPINGS, I do really like this idea, and
think it is actually probably quite important we do it, however this feels
a bit out of scope for this patch series.

I think perhaps the way forward is, if Jens and Pavel don't have any issue
with it, we open code the check and drop FOLL_SAME_FILE for this series,
then introduce it in a separate one + replace the open coding there?

I am eager to try to keep this focused on the specific task of dropping the
vmas parameter as I think FOLL_ALLOW_BROKEN_FILE_MAPPINGS is likely to
garner some discussion which should be kept separate.