RFC: get_user_pages_locked|unlocked to leverage VM_FAULT_RETRY

From: Andrea Arcangeli
Date: Fri Sep 26 2014 - 13:26:27 EST


On Thu, Sep 25, 2014 at 02:50:29PM -0700, Andres Lagar-Cavilla wrote:
> It's nearly impossible to name it right because 1) it indicates we can
> relinquish 2) it returns whether we still hold the mmap semaphore.
>
> I'd prefer it'd be called mmap_sem_hold, which conveys immediately
> what this is about ("nonblocking" or "locked" could be about a whole
> lot of things)

To me FOLL_NOWAIT/FAULT_FLAG_RETRY_NOWAIT is nonblocking,
"locked"/FAULT_FLAG_ALLOW_RETRY is still very much blocking, just
without the mmap_sem, so I called it "locked"... but I'm fine to
change the name to mmap_sem_hold. Just get_user_pages_mmap_sem_hold
seems less friendly than get_user_pages_locked(..., &locked). locked
as you used comes intuitive when you do later "if (locked) up_read".

Then I added an _unlocked kind which is a drop in replacement for many
places just to clean it up.

get_user_pages_unlocked and get_user_pages_fast are equivalent in
semantics, so any call of get_user_pages_unlocked(current,
current->mm, ...) has no reason to exist and should be replaced to
get_user_pages_fast unless "force = 1" (gup_fast has no force param
just to make the argument list a bit more confusing across the various
versions of gup).

get_user_pages over time should be phased out and dropped.

> I can see that. My background for coming into this is very similar: in
> a previous life we had a file system shim that would kick up into
> userspace for servicing VM memory. KVM just wouldn't let the file
> system give up the mmap semaphore. We had /proc readers hanging up all
> over the place while userspace was servicing. Not happy.
>
> With KVM (now) and the standard x86 fault giving you ALLOW_RETRY, what
> stands in your way? Methinks that gup_fast has no slowpath fallback
> that turns on ALLOW_RETRY. What would oppose that being the global
> behavior?

It should become the global behavior. Just it doesn't need to become a
global behavior immediately for all kind of gups (i.e. video4linux
drivers will never need to poke into the KVM guest user memory so it
doesn't matter if they don't use gup_locked immediately). Even then we
can still support get_user_pages_locked(...., locked=NULL) for
ptrace/coredump and other things that may not want to trigger the
userfaultfd protocol and just get an immediate VM_FAULT_SIGBUS.

Userfaults will just VM_FAULT_SIGBUS (translated to -EFAULT by all gup
invocations) and not invoke the userfaultfd protocol, if
FAULT_FLAG_ALLOW_RETRY is not set. So any gup_locked with locked ==
NULL or or gup() (without locked parameter) will not invoke the
userfaultfd protocol.

But I need gup_fast to use FAULT_FLAG_ALLOW_RETRY because core places
like O_DIRECT uses it.

I tried to do a RFC patch below that goes into this direction and
should be enough for a start to solve all my issues with the mmap_sem
holding inside handle_userfault(), comments welcome.

=======