Re: [PATCH] arm64/mm: Intercept pfn changes in set_pte_at()

From: Anshuman Khandual
Date: Tue Nov 22 2022 - 23:27:49 EST




On 11/22/22 16:41, Mark Rutland wrote:
> On Tue, Nov 22, 2022 at 09:57:49AM +0000, Will Deacon wrote:
>> On Tue, Nov 22, 2022 at 01:43:17PM +0530, Anshuman Khandual wrote:
>>>
>>>
>>> On 11/18/22 19:43, Will Deacon wrote:
>>>> On Wed, Nov 16, 2022 at 08:40:01AM +0530, Anshuman Khandual wrote:
>>>>> Changing pfn on a user page table mapped entry, without first going through
>>>>> break-before-make (BBM) procedure is unsafe. This just updates set_pte_at()
>>>>> to intercept such changes, via an updated pgattr_change_is_safe(). This new
>>>>> check happens via __check_racy_pte_update(), which has now been renamed as
>>>>> __check_safe_pte_update().
>>>>>
>>>>> Cc: Catalin Marinas <catalin.marinas@xxxxxxx>
>>>>> Cc: Will Deacon <will@xxxxxxxxxx>
>>>>> Cc: Mark Rutland <mark.rutland@xxxxxxx>
>>>>> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
>>>>> Cc: linux-arm-kernel@xxxxxxxxxxxxxxxxxxx
>>>>> Cc: linux-kernel@xxxxxxxxxxxxxxx
>>>>> Signed-off-by: Anshuman Khandual <anshuman.khandual@xxxxxxx>
>>>>> ---
>>>>> This applies on v6.1-rc4
>>>>>
>>>>> arch/arm64/include/asm/pgtable.h | 8 ++++++--
>>>>> arch/arm64/mm/mmu.c | 8 +++++++-
>>>>> 2 files changed, 13 insertions(+), 3 deletions(-)
>>>>
>>>> I remember Mark saying that BBM is sometimes violated by the core code in
>>>> cases where the pte isn't actually part of a live pgtable (e.g. if it's on
>>>> the stack or part of a newly allocated table). Won't that cause false
>>>> positives here?
>>>
>>> Could you please elaborate ? If the pte is not on a live page table, then
>>> pte_valid() will return negative on such entries. So any update there will
>>> be safe. I am wondering, how this change will cause false positives which
>>> would not have been possible earlier.
>>
>> I don't think pte_valid() will always return false for these entries.
>> Consider, for example, ptes which are valid but which live in a table that
>> is not reachable by the MMU. I think this is what Mark had in mind, but it
>> would be helpful if he could chime in with the specific example he ran into.
>
> Yup -- that was the case I had in mind. IIRC I hit that in the past when trying
> to do something similar, but I can't recall exactly where that was. I suspect
> that was probably to do with page migration or huge page splitting/merging.
>
> Looking around, at least __split_huge_zero_page_pmd() and
> __split_huge_pmd_locked() do something like that, creating a temporary pmd
> entry on the stack, populating a table of non-live but valid ptes, then
> plumbing it into the real pmd.

In both cases i.e __split_huge_zero_page_pmd() and __split_huge_pmd_locked(), the
entry is first asserted to be empty via pte_none(), before writing a new value in
there. set_pte_at() would still consider such updates safe because pfn_valid(old)
will return negative on such entries.

VM_BUG_ON(!pte_none(*pte));
set_pte_at(mm, haddr, pte, entry);

But if these entries still get updated yet again (while still being inactive) with
new pte values, then set_pte_at() would complain for the pfn update on the entry,
while being "valid". But is this a viable scenario ?

>
> We'd need to check that there aren't other cases like that.
>
Sure, might be some what tricky but anything in particular to be looked into ? I
guess if this change gets into a CI system which runs all memory stress tests for
long enough with CONFIG_DEBUG_VM enabled, we might get some more clue if there
are other similar scenarios possible.