Re: [RFC PATCH v1 0/4] Reduce cost of ptep_get_lockless on arm64

From: David Hildenbrand
Date: Tue Mar 26 2024 - 12:34:58 EST


On 26.03.24 17:31, Ryan Roberts wrote:
On 26/03/2024 16:17, David Hildenbrand wrote:
On 15.02.24 13:17, Ryan Roberts wrote:
This is an RFC for a series that aims to reduce the cost and complexity of
ptep_get_lockless() for arm64 when supporting transparent contpte mappings [1].
The approach came from discussion with Mark and David [2].

It introduces a new helper, ptep_get_lockless_norecency(), which allows the
access and dirty bits in the returned pte to be incorrect. This relaxation
permits arm64's implementation to just read the single target pte, and avoids
having to iterate over the full contpte block to gather the access and dirty
bits, for the contpte case.

It turns out that none of the call sites using ptep_get_lockless() require
accurate access and dirty bit information, so we can also convert those sites.
Although a couple of places need care (see patches 2 and 3).

Arguably patch 3 is a bit fragile, given the wide accessibility of
vmf->orig_pte. So it might make sense to drop this patch and stick to using
ptep_get_lockless() in the page fault path. I'm keen to hear opinions.

Yes. Especially as we have these pte_same() checks that might just fail now
because of wrong accessed/dirty bits?

Which pte_same() checks are you referring to? I've changed them all to
pte_same_norecency() which ignores the access/dirty bits when doing the comparison.

I'm reading the patches just now. So I stumbled over that just after I wrote that, so I was missing that part from the description here.



Likely, we just want to read "the real deal" on both sides of the pte_same()
handling.

Sorry I'm not sure I understand? You mean read the full pte including
access/dirty? That's the same as dropping the patch, right? Of course if we do
that, we still have to keep pte_get_lockless() around for this case. In an ideal
world we would convert everything over to ptep_get_lockless_norecency() and
delete ptep_get_lockless() to remove the ugliness from arm64.

Yes, agreed. Patch #3 does not look too crazy and it wouldn't really affect any architecture.

I do wonder if pte_same_norecency() should be defined per architecture and the default would be pte_same(). So we could avoid the mkold etc on all other architectures.

--
Cheers,

David / dhildenb