Re: [PATCH 0/1] mm: restore full accuracy in COW page reuse

From: John Hubbard
Date: Fri Jan 15 2021 - 22:46:54 EST


On 1/15/21 11:46 AM, David Hildenbrand wrote:
7) There is no easy way to detect if a page really was pinned: we might
have false positives. Further, there is no way to distinguish if it was
pinned with FOLL_WRITE or not (R vs R/W). To perform reliable tracking
we most probably would need more counters, which we cannot fit into
struct page. (AFAIU, for huge pages it's easier).

I think this is the real issue. We can only store so much information,
so we have to decide which things work and which things are broken. So
far someone hasn't presented a way to record everything at least..

I do wonder how many (especially long-term) GUP readers/writers we have
to expect, and especially, support for a single base page. Do we have a
rough estimate?

With RDMA, I would assume we only need a single one (e.g., once RDMA
device; I'm pretty sure I'm wrong, sounds too easy).
With VFIO I guess we need one for each VFIO container (~ in the worst
case one for each passthrough device).
With direct I/O, vmsplice and other GUP users ?? No idea.

If we could somehow put a limit on the #GUP we support, and fail further
GUP (e.g., -EAGAIN?) once a limit is reached, we could partition the
refcount into something like (assume max #15 GUP READ and #15 GUP R/W,
which is most probably a horribly bad choice)

[ GUP READ ][ GUP R/W ] [ ordinary ]
31 ... 28 27 ... 24 23 .... 0

But due to saturate handling in "ordinary", we would lose further 2 bits
(AFAIU), leaving us "only" 22 bits for "ordinary". Now, I have no idea
how many bits we actually need in practice.

Maybe we need less for GUP READ, because most users want GUP R/W? No idea.

Just wild ideas. Most probably that has already been discussed, and most
probably people figured that it's impossible :)


I proposed this exact idea a few days ago [1]. It's remarkable that we both
picked nearly identical values for the layout! :)

But as the responses show, security problems prevent pursuing that approach.


[1] https://lore.kernel.org/r/45806a5a-65c2-67ce-fc92-dc8c2144d766@xxxxxxxxxx

thanks,
--
John Hubbard
NVIDIA