[RFC][PATCH 0/6] IO pinning(get_user_pages()) vs fork race fix

From: KOSAKI Motohiro
Date: Tue Apr 14 2009 - 02:15:59 EST



Linux Device Drivers, Third Edition, Chapter 15: Memory Mapping and DMA says

get_user_pages is a low-level memory management function, with a suitably complex
interface. It also requires that the mmap reader/writer semaphore for the address
space be obtained in read mode before the call. As a result, calls to get_user_pages
usually look something like:

down_read(&current->mm->mmap_sem);
result = get_user_pages(current, current->mm, ...);
up_read(&current->mm->mmap_sem);

The return value is the number of pages actually mapped, which could be fewer than
the number requested (but greater than zero).

but, it isn't true. mmap_sem isn't only used for vma traversal, but also prevent vs-fork race.
up_read(mmap_sem) mean end of critical section, IOW after up_read() code is fork unsafe.
(access_process_vm() explain proper get_user_pages() usage)

Oh well, We have many wrong caller now. What is the best fix method?

Nick Piggin and Andrea Arcangeli proposed to change get_user_pages() semantics as caller expected.
see "[PATCH] fork vs gup(-fast) fix" thead in linux-mm
but Linus NACKed it.

Thus I made caller change approach patch series. it is made for discuss to compare Nick's approach.
I don't hope submit it yet.

Nick, This version fixed vmsplice and aio issue (you pointed). I hope to hear your opiniton ;)



ChangeLog:
V2 -> V3
o remove early decow logic
o introduce prevent unmap logic
o fix nfs-directio
o fix aio
o fix bio (only bandaid fix)

V1 -> V2
o fix aio+dio case

TODO
o implement down_write_killable()
o fix kvm (need?)
o fix get_arg_page() (Why this function don't use mmap_sem?)


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/