Re: [PATCH v4 4/6] io_uring: rsrc: avoid use of vmas parameter in pin_user_pages()

From: Lorenzo Stoakes
Date: Thu Apr 20 2023 - 10:19:53 EST


On Thu, Apr 20, 2023 at 02:36:47PM +0100, Pavel Begunkov wrote:
> On 4/19/23 19:35, Matthew Wilcox wrote:
> > On Wed, Apr 19, 2023 at 03:24:55PM -0300, Jason Gunthorpe wrote:
> > > On Wed, Apr 19, 2023 at 07:23:00PM +0100, Matthew Wilcox wrote:
> > > > On Wed, Apr 19, 2023 at 07:18:26PM +0100, Lorenzo Stoakes wrote:
> > > > > So even if I did the FOLL_ALLOW_BROKEN_FILE_MAPPING patch series first, I
> > > > > would still need to come along and delete a bunch of your code
> > > > > afterwards. And unfortunately Pavel's recent change which insists on not
> > > > > having different vm_file's across VMAs for the buffer would have to be
> > > > > reverted so I expect it might not be entirely without discussion.
> > > >
> > > > I don't even understand why Pavel wanted to make this change. The
> > > > commit log really doesn't say.
> > > >
> > > > commit edd478269640
> > > > Author: Pavel Begunkov <asml.silence@xxxxxxxxx>
> > > > Date: Wed Feb 22 14:36:48 2023 +0000
> > > >
> > > > io_uring/rsrc: disallow multi-source reg buffers
> > > >
> > > > If two or more mappings go back to back to each other they can be passed
> > > > into io_uring to be registered as a single registered buffer. That would
> > > > even work if mappings came from different sources, e.g. it's possible to
> > > > mix in this way anon pages and pages from shmem or hugetlb. That is not
> > > > a problem but it'd rather be less prone if we forbid such mixing.
> > > >
> > > > Cc: <stable@xxxxxxxxxxxxxxx>
> > > > Signed-off-by: Pavel Begunkov <asml.silence@xxxxxxxxx>
> > > > Signed-off-by: Jens Axboe <axboe@xxxxxxxxx>
> > > >
> > > > It even says "That is not a problem"! So why was this patch merged
> > > > if it's not fixing a problem?
> > > >
> > > > It's now standing in the way of an actual cleanup. So why don't we
> > > > revert it? There must be more to it than this ...
> > >
> > > https://lore.kernel.org/all/61ded378-51a8-1dcb-b631-fda1903248a9@xxxxxxxxx/
> >
> > So um, it's disallowed because Pavel couldn't understand why it
> > should be allowed? This gets less and less convincing.
>
> Excuse me? I'm really sorry you "couldn't understand" the explanation
> as it has probably been too much of a "mental load", but let me try to
> elaborate.
>
> Because it's currently limited what can be registered, it's indeed not
> a big deal, but that will most certainly change, and I usually and
> apparently nonsensically prefer to tighten things up _before_ it becomes
> a problem. And again, taking a random set of buffers created for
> different purposes and registering it as a single entity is IMHO not a
> sane approach.
>
> Take p2pdma for instance, if would have been passed without intermixing
> there might not have been is_pci_p2pdma_page()/etc. for every single page
> in a bvec. That's why in general, it won't change for p2pdma but there
> might be other cases in the future.
>

The proposed changes for GUP are equally intended to tighten things up both
in advance of issues (e.g. misuse of vmas parameter), for the purposes of
future improvements to GUP (optimising how we perform these operations) and
most pertinently here - ensuring broken uses of GUP do not occur.

So both are 'cleanups' in some sense :) I think it's important to point out
that this change is not simply a desire to improve an interface but has
practical implications going forward (which maybe aren't obvious at this
point yet).

The new approach to this changes goes further in that we essentially
perform the existing check here (anon/shmem/hugetlb) but for _all_
FOLL_WRITE operations in GUP to avoid broken writes to file mappings, at
which point we can just remove the check here altogether (I will post a
series for this soon).

I think that you guys should not have to be performing sanity checks here
but rather we should handle it in GUP so you don't need to think about it
at all. It feels like an mm failing that you have had to do so at all.

So I guess the question is, do you feel the requirement for vm_file being
the same should apply across _any_ GUP operation over a range or is it
io_uring-specific?

If the former then we should do it in GUP alongside the other checks (and
you can comment accordingly on that patch series when it comes), if the
latter then I would restate my opinion that we shouldn't be prevented from
making improvements to GUP in for one caller that wants to iterate
over these VMAs for custom checks + that should be done separately.

>
> > FWIW, what I was suggesting was that we should have a FOLL_SINGLE_VMA
> > flag, which would use our shiny new VMA lock infrastructure to look
> > up and lock _one_ VMA instead of having the caller take the mmap_lock.
> > Passing that flag would be a tighter restriction that Pavel implemented,
> > but would certainly relieve some of his mental load.
> >
> > By the way, even if all pages are from the same VMA, they may still be a
> > mixture of anon and file pages; think a MAP_PRIVATE of a file when
> > only some pages have been written to. Or an anon MAP_SHARED which is
> > accessible by a child process.
>
> --
> Pavel Begunkov