Re: [PATCH v2 2/3] userfaultfd: UFFDIO_REMAP uABI

From: David Hildenbrand
Date: Tue Oct 03 2023 - 17:05:33 EST


On 03.10.23 22:04, Suren Baghdasaryan wrote:
On Mon, Oct 2, 2023 at 12:34 PM Lokesh Gidra <lokeshgidra@xxxxxxxxxx> wrote:

On Mon, Oct 2, 2023 at 6:43 PM David Hildenbrand <david@xxxxxxxxxx> wrote:

On 02.10.23 17:55, Lokesh Gidra wrote:
On Mon, Oct 2, 2023 at 4:46 PM Lokesh Gidra <lokeshgidra@xxxxxxxxxx> wrote:

On Mon, Oct 2, 2023 at 4:21 PM Peter Xu <peterx@xxxxxxxxxx> wrote:

On Mon, Oct 02, 2023 at 10:00:03AM +0200, David Hildenbrand wrote:
In case we cannot simply remap the page, the fallback sequence (from the
cover letter) would be triggered.

1) UFFDIO_COPY
2) MADV_DONTNEED

So we would just handle the operation internally without a fallback.

Note that I think there will be a slight difference on whole remap
atomicity, on what happens if the page is modified after UFFDIO_COPY but
before DONTNEED.

UFFDIO_REMAP guarantees full atomicity when moving the page, IOW, threads
can be updating the pages when ioctl(UFFDIO_REMAP), data won't get lost
during movement, and it will generate a missing event after moved, with
latest data showing up on dest.

I'm not sure that means such a fallback is a problem, Suren may know
better with the use case.

Although there is no problem in using fallback with our use case but
as a user of userfaultfd, I'd suggest leaving it to the developer.
Failing with appropriate errno makes more sense. If handled in the
kernel, then the user may assume at the end of the operation that the
src vma is completely unmapped. And if not correctness issues, it
could lead to memory leaks.

I meant that in addition to the possibility of correctness issues due
to lack of atomicity, it could also lead to memory leaks, as the user
may assume that src vma is empty post-operation. IMHO, it's better to
fail with errno so that the user would fix the code with necessary
changes (like using DONTFORK, if forking).

Leaving the atomicity discussion out because I think this can just be
handled (e.g., the src_vma would always be empty post-operation):

It might not necessarily be a good idea to only expose micro-operations
to user space. If the user-space fallback will almost always be
"UFFDIO_COPY+MADV_DONTNEED", then clearly the logical operation
performed is moving data, ideally with zero-copy.

IMHO, such a fallback will be useful only if it's possible that only
some pages in the src vma fail due to this. But even then it would be
really useful to have a flag maybe like UFFDIO_REMAP_FALLBACK_COPY to
control if the user wants the fallback or not. OTOH, if this is
something that can be detected for the entire src vma, then failing
with errno is more appropriate.

Given that the patch is already quite complicated, I humbly suggest
leaving the fallback for now as a TODO.

I agree about the complexity, and I hope we can reduce that further. Otherwise such things end up being a maintainance nightmare.


Ok, I think it makes sense to implement the strict remap logic but in
a way that we can easily add copy fallback if that's needed in the

I think whatever we do, we should

a) never talk about any of the implementation details (mapcount, swapcount, PAE) towards the users

b) make it clear from the start that we might change the decision when we fail (to the better or the worse); users should be prepared to implement backup paths. We certainly don't want such behavior to be ABI.

I'd suggest documenting something like the following

"The operation may fail for various reasons. Usually, remapping of pages that are not exclusive to the given process fail; once KSM might dedduplicate pages or fork() COW-shares pages during fork() with child processes, they are no longer exclusive. Further, the kernel might only perform lightweight checks for detecting whether the pages are exclusive, and return -EWHATSOEVER in case that check fails. To make the operation more likely to succeed, KSM should be disabled, fork() should be avoided or MADV_DONTFORK should be configured for the source VMA before fork()."

future. So, I'll change UFFDIO_REMAP to UFFDIO_MOVE and will return
some unique error, like EBUSY when the page is not PAE. If we need to
add a copy fallback in the future, we will add a
UFFDIO_MOVE_MODE_ALLOW_COPY flag and will implement the copy
mechanism. Does that sound good?

To me, if we're talking about moving data, then zero-copy is the optimization and copy+delete would be the (slower) default.

If we're talking about remapping, then there is no copy; we're remapping pages.


So if we'd ever want to support the copy case, one combination would be

UFFDIO_MOVE + UFFDIO_MOVE_ZERO_COPY_ONLY

whereby we would fail if the latter is not specified.

But just my thoughts.

--
Cheers,

David / dhildenb