RE: [PATCH v10 2/3] arm64: mm: implement arch_faults_on_old_pte() on arm64

From: Justin He (Arm Technology China)
Date: Mon Oct 07 2019 - 22:30:43 EST




> -----Original Message-----
> From: Justin He (Arm Technology China)
> Sent: 2019年10月8日 9:55
> To: Marc Zyngier <maz@xxxxxxxxxx>; Will Deacon <will@xxxxxxxxxx>
> Cc: Catalin Marinas <Catalin.Marinas@xxxxxxx>; Mark Rutland
> <Mark.Rutland@xxxxxxx>; James Morse <James.Morse@xxxxxxx>;
> Matthew Wilcox <willy@xxxxxxxxxxxxx>; Kirill A. Shutemov
> <kirill.shutemov@xxxxxxxxxxxxxxx>; linux-arm-kernel@xxxxxxxxxxxxxxxxxxx;
> linux-kernel@xxxxxxxxxxxxxxx; linux-mm@xxxxxxxxx; Punit Agrawal
> <punitagrawal@xxxxxxxxx>; Thomas Gleixner <tglx@xxxxxxxxxxxxx>;
> Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>; hejianet@xxxxxxxxx; Kaly
> Xin (Arm Technology China) <Kaly.Xin@xxxxxxx>; nd <nd@xxxxxxx>
> Subject: RE: [PATCH v10 2/3] arm64: mm: implement
> arch_faults_on_old_pte() on arm64
>
> Hi Will and Marc
>
> > -----Original Message-----
> > From: Marc Zyngier <maz@xxxxxxxxxx>
> > Sent: 2019年10月1日 21:32
> > To: Will Deacon <will@xxxxxxxxxx>
> > Cc: Justin He (Arm Technology China) <Justin.He@xxxxxxx>; Catalin
> > Marinas <Catalin.Marinas@xxxxxxx>; Mark Rutland
> > <Mark.Rutland@xxxxxxx>; James Morse <James.Morse@xxxxxxx>;
> > Matthew Wilcox <willy@xxxxxxxxxxxxx>; Kirill A. Shutemov
> > <kirill.shutemov@xxxxxxxxxxxxxxx>; linux-arm-kernel@xxxxxxxxxxxxxxxxxxx;
> > linux-kernel@xxxxxxxxxxxxxxx; linux-mm@xxxxxxxxx; Punit Agrawal
> > <punitagrawal@xxxxxxxxx>; Thomas Gleixner <tglx@xxxxxxxxxxxxx>;
> > Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>; hejianet@xxxxxxxxx;
> Kaly
> > Xin (Arm Technology China) <Kaly.Xin@xxxxxxx>
> > Subject: Re: [PATCH v10 2/3] arm64: mm: implement
> > arch_faults_on_old_pte() on arm64
> >
> > On Tue, 1 Oct 2019 13:50:32 +0100
> > Will Deacon <will@xxxxxxxxxx> wrote:
> >
> > > On Mon, Sep 30, 2019 at 09:57:39AM +0800, Jia He wrote:
> > > > On arm64 without hardware Access Flag, copying fromuser will fail
> > because
> > > > the pte is old and cannot be marked young. So we always end up with
> > zeroed
> > > > page after fork() + CoW for pfn mappings. we don't always have a
> > > > hardware-managed access flag on arm64.
> > > >
> > > > Hence implement arch_faults_on_old_pte on arm64 to indicate that
> it
> > might
> > > > cause page fault when accessing old pte.
> > > >
> > > > Signed-off-by: Jia He <justin.he@xxxxxxx>
> > > > Reviewed-by: Catalin Marinas <catalin.marinas@xxxxxxx>
> > > > ---
> > > > arch/arm64/include/asm/pgtable.h | 14 ++++++++++++++
> > > > 1 file changed, 14 insertions(+)
> > > >
> > > > diff --git a/arch/arm64/include/asm/pgtable.h
> > b/arch/arm64/include/asm/pgtable.h
> > > > index 7576df00eb50..e96fb82f62de 100644
> > > > --- a/arch/arm64/include/asm/pgtable.h
> > > > +++ b/arch/arm64/include/asm/pgtable.h
> > > > @@ -885,6 +885,20 @@ static inline void update_mmu_cache(struct
> > vm_area_struct *vma,
> > > > #define phys_to_ttbr(addr) (addr)
> > > > #endif
> > > >
> > > > +/*
> > > > + * On arm64 without hardware Access Flag, copying from user will
> fail
> > because
> > > > + * the pte is old and cannot be marked young. So we always end up
> > with zeroed
> > > > + * page after fork() + CoW for pfn mappings. We don't always have a
> > > > + * hardware-managed access flag on arm64.
> > > > + */
> > > > +static inline bool arch_faults_on_old_pte(void)
> > > > +{
> > > > + WARN_ON(preemptible());
> > > > +
> > > > + return !cpu_has_hw_af();
> > > > +}
> > >
> > > Does this work correctly in a KVM guest? (i.e. is the MMFR sanitised in
> > that
> > > case, despite not being the case on the host?)
> >
> > Yup, all the 64bit MMFRs are trapped (HCR_EL2.TID3 is set for an
> > AArch64 guest), and we return the sanitised version.
> Thanks for Marc's explanation. I verified the patch series on a kvm guest (-
> M virt)
> with simulated nvdimm device created by qemu. The host is ThunderX2
> aarch64.
>
> >
> > But that's an interesting remark: we're now trading an extra fault on
> > CPUs that do not support HWAFDBS for a guaranteed trap for each and
> > every guest under the sun that will hit the COW path...
> >
> > My gut feeling is that this is going to be pretty visible. Jia, do you
> > have any numbers for this kind of behaviour?
> It is not a common COW path, but a COW for PFN mapping pages only.
> I add a g_counter before pte_mkyoung in force_mkyoung{} when testing
> vmmalloc_fork at [1].
>
> In this test case, it will start M fork processes and N pthreads. The default is
> M=2,N=4. the g_counter is about 241, that is it will hit my patch series for
> 241
> times.
> If I set M=20 and N=40 for TEST3, the g_counter is about 1492.

The time overhead of test vmmalloc_fork is:
real 0m5.411s
user 0m4.206s
sys 0m2.699s

>
> [1] https://github.com/pmem/pmdk/tree/master/src/test/vmmalloc_fork
>
>
> --
> Cheers,
> Justin (Jia He)
>