Re: [PATCH v4 4/6] io_uring: rsrc: avoid use of vmas parameter in pin_user_pages()

From: Pavel Begunkov
Date: Thu Apr 20 2023 - 09:40:10 EST


On 4/19/23 19:35, Matthew Wilcox wrote:
On Wed, Apr 19, 2023 at 03:24:55PM -0300, Jason Gunthorpe wrote:
On Wed, Apr 19, 2023 at 07:23:00PM +0100, Matthew Wilcox wrote:
On Wed, Apr 19, 2023 at 07:18:26PM +0100, Lorenzo Stoakes wrote:
So even if I did the FOLL_ALLOW_BROKEN_FILE_MAPPING patch series first, I
would still need to come along and delete a bunch of your code
afterwards. And unfortunately Pavel's recent change which insists on not
having different vm_file's across VMAs for the buffer would have to be
reverted so I expect it might not be entirely without discussion.

I don't even understand why Pavel wanted to make this change. The
commit log really doesn't say.

commit edd478269640
Author: Pavel Begunkov <asml.silence@xxxxxxxxx>
Date: Wed Feb 22 14:36:48 2023 +0000

io_uring/rsrc: disallow multi-source reg buffers

If two or more mappings go back to back to each other they can be passed
into io_uring to be registered as a single registered buffer. That would
even work if mappings came from different sources, e.g. it's possible to
mix in this way anon pages and pages from shmem or hugetlb. That is not
a problem but it'd rather be less prone if we forbid such mixing.

Cc: <stable@xxxxxxxxxxxxxxx>
Signed-off-by: Pavel Begunkov <asml.silence@xxxxxxxxx>
Signed-off-by: Jens Axboe <axboe@xxxxxxxxx>

It even says "That is not a problem"! So why was this patch merged
if it's not fixing a problem?

It's now standing in the way of an actual cleanup. So why don't we
revert it? There must be more to it than this ...

https://lore.kernel.org/all/61ded378-51a8-1dcb-b631-fda1903248a9@xxxxxxxxx/

So um, it's disallowed because Pavel couldn't understand why it
should be allowed? This gets less and less convincing.

Excuse me? I'm really sorry you "couldn't understand" the explanation
as it has probably been too much of a "mental load", but let me try to
elaborate.

Because it's currently limited what can be registered, it's indeed not
a big deal, but that will most certainly change, and I usually and
apparently nonsensically prefer to tighten things up _before_ it becomes
a problem. And again, taking a random set of buffers created for
different purposes and registering it as a single entity is IMHO not a
sane approach.

Take p2pdma for instance, if would have been passed without intermixing
there might not have been is_pci_p2pdma_page()/etc. for every single page
in a bvec. That's why in general, it won't change for p2pdma but there
might be other cases in the future.


FWIW, what I was suggesting was that we should have a FOLL_SINGLE_VMA
flag, which would use our shiny new VMA lock infrastructure to look
up and lock _one_ VMA instead of having the caller take the mmap_lock.
Passing that flag would be a tighter restriction that Pavel implemented,
but would certainly relieve some of his mental load.

By the way, even if all pages are from the same VMA, they may still be a
mixture of anon and file pages; think a MAP_PRIVATE of a file when
only some pages have been written to. Or an anon MAP_SHARED which is
accessible by a child process.

--
Pavel Begunkov