Re: [PATCHv2 1/3] x86/mm: Provide pmdp_establish() helper

From: Kirill A. Shutemov
Date: Mon Jun 19 2017 - 17:52:26 EST


On Mon, Jun 19, 2017 at 06:09:12PM +0100, Catalin Marinas wrote:
> On Mon, Jun 19, 2017 at 07:00:05PM +0300, Kirill A. Shutemov wrote:
> > On Mon, Jun 19, 2017 at 04:22:29PM +0100, Catalin Marinas wrote:
> > > On Thu, Jun 15, 2017 at 05:52:22PM +0300, Kirill A. Shutemov wrote:
> > > > We need an atomic way to setup pmd page table entry, avoiding races with
> > > > CPU setting dirty/accessed bits. This is required to implement
> > > > pmdp_invalidate() that doesn't loose these bits.
> > > >
> > > > On PAE we have to use cmpxchg8b as we cannot assume what is value of new pmd and
> > > > setting it up half-by-half can expose broken corrupted entry to CPU.
> > > >
> > > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx>
> > > > Cc: Ingo Molnar <mingo@xxxxxxxxxx>
> > > > Cc: H. Peter Anvin <hpa@xxxxxxxxx>
> > > > Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> > >
> > > I'll look at this from the arm64 perspective. It would be good if we can
> > > have a generic atomic implementation based on cmpxchg64 but I need to
> > > look at the details first.
> >
> > Unfortunately, I'm not sure it's possbile.
> >
> > The format of a page table is defined per-arch. We cannot assume much about
> > it in generic code.
> >
> > I guess we could make it compile by casting to 'unsigned long', but is it
> > useful?
> > Every architecture manintainer still has to validate that this assumption
> > is valid for the architecture.
>
> You are right, not much gained in doing this.
>
> Maybe a stupid question but can we not implement pmdp_invalidate() with
> something like pmdp_get_and_clear() (usually reusing the ptep_*
> equivalent). Or pmdp_clear_flush() (again, reusing ptep_clear_flush())?
>
> In my quick grep on pmdp_invalidate, it seems to be followed by
> set_pmd_at() or pmd_populate() already and the *pmd value after
> mknotpresent isn't any different from 0 to the hardware (at least on
> ARM). That's unless Linux expects to see some non-zero value here if
> walking the page tables on another CPU.

The whole reason to have pmdp_invalidate() in first place is to never make
pmd clear in the middle. Otherwise we will get race with MADV_DONTNEED.
See ced108037c2a for an example of such race.

--
Kirill A. Shutemov