Re: [PATCH] mm: Speed up mremap on large regions

From: Juergen Gross
Date: Fri Oct 12 2018 - 01:29:49 EST


On 12/10/2018 05:21, Jann Horn wrote:
> +cc xen maintainers and kvm folks
>
> On Fri, Oct 12, 2018 at 4:40 AM Joel Fernandes (Google)
> <joel@xxxxxxxxxxxxxxxxx> wrote:
>> Android needs to mremap large regions of memory during memory management
>> related operations. The mremap system call can be really slow if THP is
>> not enabled. The bottleneck is move_page_tables, which is copying each
>> pte at a time, and can be really slow across a large map. Turning on THP
>> may not be a viable option, and is not for us. This patch speeds up the
>> performance for non-THP system by copying at the PMD level when possible.
> [...]
>> +bool move_normal_pmd(struct vm_area_struct *vma, unsigned long old_addr,
>> + unsigned long new_addr, unsigned long old_end,
>> + pmd_t *old_pmd, pmd_t *new_pmd, bool *need_flush)
>> +{
> [...]
>> + /*
>> + * We don't have to worry about the ordering of src and dst
>> + * ptlocks because exclusive mmap_sem prevents deadlock.
>> + */
>> + old_ptl = pmd_lock(vma->vm_mm, old_pmd);
>> + if (old_ptl) {
>> + pmd_t pmd;
>> +
>> + new_ptl = pmd_lockptr(mm, new_pmd);
>> + if (new_ptl != old_ptl)
>> + spin_lock_nested(new_ptl, SINGLE_DEPTH_NESTING);
>> +
>> + /* Clear the pmd */
>> + pmd = *old_pmd;
>> + pmd_clear(old_pmd);
>> +
>> + VM_BUG_ON(!pmd_none(*new_pmd));
>> +
>> + /* Set the new pmd */
>> + set_pmd_at(mm, new_addr, new_pmd, pmd);
>> + if (new_ptl != old_ptl)
>> + spin_unlock(new_ptl);
>> + spin_unlock(old_ptl);
>
> How does this interact with Xen PV? From a quick look at the Xen PV
> integration code in xen_alloc_ptpage(), it looks to me as if, in a
> config that doesn't use split ptlocks, this is going to temporarily
> drop Xen's type count for the page to zero, causing Xen to de-validate
> and then re-validate the L1 pagetable; if you first set the new pmd
> before clearing the old one, that wouldn't happen. I don't know how
> this interacts with shadow paging implementations.

No, this isn't an issue. As the L1 pagetable isn't being released it
will stay pinned, so there will be no need to revalidate it.

For Xen in shadow mode I'm quite sure it just doesn't matter. In the
case another thread of the process is accessing the memory in parallel
it might even be better to not having a L1 pagetable with 2 references
at the same time, but this is an academic problem which doesn't need to
be tuned for performance IMO.


Juergen