Re: [RFC PATCH v1 0/4] Reduce cost of ptep_get_lockless on arm64

From: David Hildenbrand
Date: Tue Mar 26 2024 - 13:07:02 EST



Likely, we just want to read "the real deal" on both sides of the pte_same()
handling.

Sorry I'm not sure I understand? You mean read the full pte including
access/dirty? That's the same as dropping the patch, right? Of course if we do
that, we still have to keep pte_get_lockless() around for this case. In an ideal
world we would convert everything over to ptep_get_lockless_norecency() and
delete ptep_get_lockless() to remove the ugliness from arm64.

Yes, agreed. Patch #3 does not look too crazy and it wouldn't really affect any
architecture.

I do wonder if pte_same_norecency() should be defined per architecture and the
default would be pte_same(). So we could avoid the mkold etc on all other
architectures.

Wouldn't that break it's semantics? The "norecency" of
ptep_get_lockless_norecency() means "recency information in the returned pte may
be incorrect". But the "norecency" of pte_same_norecency() means "ignore the
access and dirty bits when you do the comparison".

My idea was that ptep_get_lockless_norecency() would return the actual result on these architectures. So e.g., on x86, there would be no actual change in generated code.

But yes, the documentation of these functions would have to be improved.

Now I wonder if ptep_get_lockless_norecency() should actively clear dirty/accessed bits to more easily find any actual issues where the bits still matter ...


I think you could only do the optimization you describe if you required that
pte_same_norecency() would only be given values returned by
ptep_get_lockless_norecency() (or ptep_get_norecency()). Even then, its not
quite the same; if a page is accessed between gets one will return true and the
other false.

Right.

--
Cheers,

David / dhildenb