Re: [PATCH v1 06/11] mm: support GUP-triggered unsharing via FAULT_FLAG_UNSHARE (!hugetlb)

From: David Hildenbrand
Date: Sun Dec 19 2021 - 13:00:03 EST


On 19.12.21 18:44, Linus Torvalds wrote:
> David, you said that you were working on some alternative model. Is it
> perhaps along these same lines below?
>
> I was thinking that a bit in the page tables to say "this page is
> exclusive to this VM" would be a really simple thing to deal with for
> fork() and swapout and friends.
>
> But we don't have such a bit in general, since many architectures have
> very limited sets of SW bits, and even when they exist we've spent
> them on things like UDDF_WP.,
>
> But the more I think about the "bit doesn't even have to be in the
> page tables", the more I think maybe that's the solution.
>
> A bit in the 'struct page' itself.
>

Exactly what I am prototyping right now.

> For hugepages, you'd have to distribute said bit when you split the hugepage.

Yes, that's one tricky part ...

>
> But other than that it looks quite simple: anybody who does a virtual
> copy will inevitably be messing with the page refcount, so clearing
> the "exclusive ownership" bit wouldn't be costly: the 'struct page'
> cacheline is already getting dirtied.
>
> Or what was your model you were implying you were thinking about in
> your other email? You said

I'm playing with the idea of not setting the bit always during COW but
only on GUP request to set the bit (either manually if possible or via
FOLL_UNSHARE). That's a bit more tricky but allows for decoupling that
approach completely from the page_pin() counter.

fork() is allowed to clear the bit if page_count() == 1 and share the
page. So no GUP->no fork() performance changes (!) . Otherwise the bit
can only vanish if we swapout/migrate the page: in which case there are
no additional GUP/references on the page that rely on it!

The bit can be set directly if we have to copy the page in the fault
handler (COW or unshare). Outside of COW/Unshare code, the bit can only
be set if page_count() == 1 and we sync against fork(). (and that's the
problem for gup-fast-only that I'm investigating right now, because it
would then always have to fallback to the slow variant if the bit isn't
already set)

So the bit can "vanish" whenever there is no additional reference on the
page. GUP syncs against fork() and can thereby set the bit/request to
set the bit.

I'm trying to decouple it completely from the page_pin() counter to also
be able to handle FOLL_GET (O_DIRECT reproducers unfortunately) correctly.

Not set it stone, just an idea what I'm playing with right now ... and I
have to tripple-check if
* page is PTE mapped in the page table I'm walking
* page_count() == 1
Really means that "this is the only reference.". I do strongly believe
so .. :)

--
Thanks,

David / dhildenb