Re: [PATCH v2 2/3] userfaultfd: UFFDIO_REMAP uABI

From: Suren Baghdasaryan
Date: Thu Sep 28 2023 - 15:47:42 EST


On Thu, Sep 28, 2023 at 11:34 AM Peter Xu <peterx@xxxxxxxxxx> wrote:
>
> On Thu, Sep 28, 2023 at 07:51:18PM +0200, David Hildenbrand wrote:
> > On 28.09.23 19:21, Peter Xu wrote:
> > > On Thu, Sep 28, 2023 at 07:05:40PM +0200, David Hildenbrand wrote:
> > > > As described as reply to v1, without fork() and KSM, the PAE bit should
> > > > stick around. If that's not the case, we should investigate why.
> > > >
> > > > If we ever support the post-fork case (which the comment above remap_pages()
> > > > excludes) we'll need good motivation why we'd want to make this
> > > > overly-complicated feature even more complicated.
> > >
> > > The problem is DONTFORK is only a suggestion, but not yet restricted. If
> > > someone reaches on top of some !PAE page on src it'll never gonna proceed
> > > and keep failing, iiuc.
> >
> > Yes. It won't work if you fork() and not use DONTFORK on the src VMA. We
> > should document that as a limitation.
> >
> > For example, you could return an error to the user that can just call
> > UFFDIO_COPY. (or to the UFFDIO_COPY from inside uffd code, but that's
> > probably ugly as well).
>
> We could indeed provide some special errno perhaps upon the PAE check, then
> document it explicitly in the man page and suggest resolutions (like
> DONTFORK) when user hit it.
>
> >
> > >
> > > do_wp_page() doesn't have that issue of accuracy only because one round of
> > > CoW will just allocate a new page with PAE set guaranteed, which is pretty
> > > much self-heal and unnoticed.
> >
> > Yes. But it might have to copy, at which point the whole optimization of
> > remap is gone :)
>
> Right, but that's fine IMHO because it should still be very corner case,
> definitely not expected to be the majority to start impact the performance
> results.
>
> >
> > >
> > > So it'll be great if we can have similar self-heal way for PAE. If not, I
> > > think it's still fine we just always fail on !PAE src pages, but then maybe
> > > we should let the user know what's wrong, e.g., the user can just forgot to
> > > apply DONTFORK then forked. And then the user hits error and don't know
> > > what happened. Probably at least we should document it well in man pages.
> > >
> > Yes, exactly.
> >
> > > Another option can be we keep using folio_mapcount() for pte, and another
> > > helper (perhaps: _nr_pages_mapped==COMPOUND_MAPPED && _entire_mapcount==1)
> > > for thp. But I know that's not ideal either.
> >
> > As long as we only set the pte writable if PAE is set, we're good from a CVE
> > perspective. The other part is just simplicity of avoiding all these
> > mapcount+swapcount games where possible.
> >
> > (one day folio_mapcount() might be faster -- I'm still working on that patch
> > in the bigger picture of handling PTE-mapped THP better)
>
> Sure.
>
> For now as long as we're crystal clear on the possibility of inaccuracy of
> PAE, it never hits besides fork() && !DONTFORK, and properly document it,
> then sounds good here.

Ok, sounds like we have a consensus. I'll prepare manpage changes to
document the DONTFORK requirement for uffd_remap.

>
> Thanks,
>
> --
> Peter Xu
>