Re: [PATCH] x86/mm: fix return value of p[um]dp_set_access_flags

From: Wei Yang
Date: Thu Sep 19 2019 - 22:18:49 EST


On Thu, Sep 19, 2019 at 10:25:05AM -0700, Dave Hansen wrote:
>On 9/19/19 1:25 AM, Wei Yang wrote:
>> Function p[um]dp_set_access_flags is used with update_mmu_cache_p[um]d
>> and the return value from p[um]dp_set_access_flags indicates whether it
>> is necessary to do the cache update.
>
>If this change is correct, why was it not applied to
>ptep_set_access_flags()? That function has the same form.
>

Thanks, you are right, I missed this point.

>BTW, I rather dislike the 'dirty' variable name. It seems to be
>indicating whether the fault was a write fault or not and whether we
>*expect* the dirty bit to be set in 'entry'.
>

I found some comments for ptep_set_access_flags in OLD code, which says

/*
* We only update the dirty/accessed state if we set
* the dirty bit by hand in the kernel, since the hardware
* will do the accessed bit for us, and we don't want to
* race with other CPU's that might be updating the dirty
* bit at the same time.
*/

I guess this is reason for the naming. Looks currently we use this value in
some other way.

>> From current code logic, only when changed && dirty, related page table
>> entry would be updated. It is not necessary to update cache when the
>> real page table entry is not changed.
>
>This logic doesn't really hold up, though.
>
>If we are only writing accessed and/or dirty bits, then we *never* need
>to flush. The flush might avoid a stale TLB entry causing an extra page
>walk by the hardware, but it's probably not ever worth the cost of the
>flush.
>
>Unless there's something weird happening with paravirt, I can't ever see
>a good reason to flush the TLB when just setting accessed/dirty bits on
>bare-metal.
>
>This seems like a place where a debugging check to validate that only
>accessed/dirty bits are only being set would be a good idea.

The function update_mmu_cache_pmd is not for x86, IMHO, since it is empty on
x86.

I took a look into the definition on powerpc, which is defined in
arch/powerpc/mm/book3s64/pgtable.c, which preload the content of this HPTE. So
the cache update here is not TLB flush on x86.

And then I compare the definition of ptep_set_access_flags on x86 and powerpc.
On powerpc, which is defined in arch/powerpc/mm/pgtable.c, update the pte
without checking *dirty*. This logic make sense to me. So I am confused with
why on x86 we need to check both *changed* and *dirty*.

Then I searched the history, and found commit 8dab5241d06bf may explain
something.

The commit log says:

This patch reworks ptep_set_access_flags() semantics, implementations and
callers so that it's now responsible for returning whether an update is
necessary or not (basically whether the PTE actually changed).

If my understanding is correct, the return value should indicate whether the
PTE actually changed.

But the change introduced in commit 8dab5241d06bf is not doing the exact
thing on x86. Since we only change PTE only both *dirty* and *change*, just
return *changed* seems not match the semantics.

Last but not least, since update_mmu_cache_pmd is empty, even return value is
not correct, it doesn't break anything.

Hope my understanding is correct.

--
Wei Yang
Help you, Help me