Re: [PATCH] mm/gup: Fixes FOLL_UNLOCKABLE against FOLL_NOWAIT

From: Peter Xu
Date: Mon May 15 2023 - 11:49:16 EST


On Fri, May 12, 2023 at 02:06:36PM -0300, Jason Gunthorpe wrote:
> On Thu, May 11, 2023 at 08:31:02PM -0400, Peter Xu wrote:
>
> > E.g., with current code we could at last have FAULT_FLAG_RETRY_NOWAIT set
> > even if with a FOLL_UNLOCKABLE gup which doesn't make a lot of
> > sense.
>
> I would say NOWAIT and UNLOCKABLE are different things. UNLOCKABLE
> says the mmap sem is allowed to be unlocked, which is true, and NOWAIT
> says it shouldn't "wait" (which is something more nebulous than just
> sleep). In FOLL_ flag terms it would be fine if the mmap sem was
> unlocked while doing NOWAIT - even though the fault hanlder will not
> doe this.
>
> The only caller is fine with this too.
>
> !UNLOCKABLE literally means not to ever drop the mmap lock which is
> not something KVM needs at all.

The problem is FOLL_NOWAIT implies FAULT_FLAG_RETRY_NOWAIT internally.

Then we'll have FAULT_FLAG_RETRY_NOWAIT+FAULT_FLAG_KILLABLE which makes it
very confusing, because RETRY_NOWAIT means we never release mmap lock or
retry, then KILL means "if we wait, allow us to be killed".

Considering FOLL_UNLOCKABLE is an internal flag while FOLL_NOWAIT a public
(even if only with a single caller...), I'd still think it makes more sense
and cleaner to just remove FOLL_UNLOCKABLE if FOLL_NOWAIT, no?

Again, nothing to blame for previous commit (I explained in the commit
message too that we don't need fixes, but simply a cleanup), but it seems
removing this confusion of NOWAIT+UNLOCKABLE could be helpful to me.

>
> So I'd say it is fine as is. A caller should never assume that calling
> an unlocked function or passing null locked means that the mmap sem
> won't be unlocked while running indirectly because of other GUP
> flags. If it wants this behavior it needs to ask for it explicitly
> with a locked GUP call and a NULL locked.
>
> > Since at it, the same commit added unconditional FOLL_UNLOCKABLE in
> > faultin_vma_page_range(), which is code-wise correct becuase the helper
> > only has one user right now and it always has "locked" set.
>
> Not quite, it is correct because that is the API contract of this
> function. The caller must provide a non-NULL locked and non-NULL
> locked at the external interfaces always mean it can be unlocked while
> running.

Hmm yes, that's the contract. But then it makes more sense to assert on
that contract (by checking locked)?

How about I rework the commit message but keep the change (which literally
only add the assertion)?

--
Peter Xu