Re: [RFC 4/4] Change mmap_sem to range lock

From: Laurent Dufour
Date: Mon Apr 24 2017 - 11:48:26 EST


On 21/04/2017 01:36, Andi Kleen wrote:
> Laurent Dufour <ldufour@xxxxxxxxxxxxxxxxxx> writes:
>
>> [resent this patch which seems to have not reached the mailing lists]
>>
>> Change the mmap_sem to a range lock to allow finer grain locking on
>> the memory layout of a task.
>>
>> This patch rename mmap_sem into mmap_rw_tree to avoid confusion and
>> replace any locking (read or write) by complete range locking. So
>> there is no functional change except in the way the underlying locking
>> is achieved.
>>
>> Currently, this patch only supports x86 and PowerPc architectures,
>> furthermore it should break the build of any others.
>
> Thanks for working on this.
>
> However as commented before I think the first step to make progress here
> is a description of everything mmap_sem protects.

Hi Andy,

I looked for the write mmap_sem locking in x86 and ppc64 architectures,
here is what I found:

mmap_sem protects
vdso mapping
VMA layout changes
VMA cache
Page protection/layout
Changes to mmu notifier chain
mmap_sem is used to serialize khugepaged's access
mmap_sem is used to serialize ksm's access
protection keys (pkey_alloc()...)

Calls to
get_unmap_area()
do_mmap()
do_mmap_pgoff()
do_munmap()
get_user_pages()
put_page()
set_page_dirty_lock()
find_vma()
find_vma_intersection()
alloc_empty_pages()
insert_vm_struct()
get_mm_rss()
uprobe_consumer->filter() (currently only uprobe_perf_filter())
_install_special_mapping()
pmdp_collapse_flush()
do_swap_page()
do_brk()
__split_vma()
mremap_to()
vma_to_resize()
vma_adjust()

MM fields
pinned_vm
stack_vm
total_vm
locked_vm
start_stack
start_code
end_code
start_data
start_brk
bd_addr
mm_users
core_state
context.vdso_*
def_flags
mmu_notifier_mm

VMA fields
vm_private_data
vm_flags
vm_page_prot
vm_file
vm_pgoff
vm_policy


Userfaultfd has not been looked in details yet.
dup_mmap() locks the oldmm in write mode when copying it, is it necessary ?

> Surely the init full case could be done shorter with some wrapper
> that combines the init_full and lock operation?

Yes that doable, I wrote this like that, because the range should be
initialized based on the on going operation, so having an explicit init
operation is making this more explicit.

> Then it would be likely a simple search'n'replace to move the
> whole tree in one atomic step to the new wrappers.
> Initially they could be just defined to use rwsems too to
> not change anything at all.
>
> It would be a good idea to merge such a patch as quickly
> as possible beause it will be a nightmare to maintain
> longer term.
>
> Then you could add a config to use a range lock through
> the wrappers.

I agree, I should try a way to make that patch activated through a
CONFIG_value, but there is a the additional range value that make it
more complex to achieve. I'll try to figure out a way to do that.

> Then after that you could add real ranges step by step,
> after doing the proper analysis.

That's the biggest part of the job.
I'm also wondering if a dedicated lock/sem should be introduced to
protect the VMA cache and the VMA list, since the range itself will not
protect against change while walking the VMA list.

Please advise.

Cheers,
Laurent.