Re: [PATCH] [suggestion] mm/gup: avoid IS_ERR_OR_NULL

From: Lorenzo Stoakes
Date: Sat May 20 2023 - 05:14:58 EST


On Sat, May 20, 2023 at 05:25:52AM -0300, Jason Gunthorpe wrote:
> On Sat, May 20, 2023 at 06:19:37AM +0100, Lorenzo Stoakes wrote:
> > On Fri, May 19, 2023 at 07:17:41PM -0300, Jason Gunthorpe wrote:
> > > On Fri, May 19, 2023 at 03:51:51PM +0100, Lorenzo Stoakes wrote:
> > > > Given you are sharply criticising the code I authored here, is it too much
> > > > to ask for you to cc- me, the author on commentaries like this? Thanks.
> > > >
> > > > On Fri, May 19, 2023 at 11:39:13AM +0200, Arnd Bergmann wrote:
> > > > > From: Arnd Bergmann <arnd@xxxxxxxx>
> > > > >
> > > > > While looking at an unused-variable warning, I noticed a new interface coming
> > > > > in that requires the use of IS_ERR_OR_NULL(), which tends to indicate bad
> > > > > interface design and is usually surprising to users.
> > > >
> > > > I am not sure I understand your reasoning, why does it 'tend to indicate
> > > > bad interface design'? You say that as if it is an obvious truth. Not
> > > > obvious to me at all.
> > > >
> > > > There are 3 possible outcomes from the function - an error, the function
> > > > failing to pin a page, or it succeeding in doing so. For some of the
> > > > callers that results in an error, for others it is not an error.
> > >
> > > No, there really isn't.
> > >
> > > Either it pins the page or it doesn't. Returning "NULL" to mean a
> > > specific kind of failure was encountered is crazy.. Especially if we
> > > don't document what that specific failure even was.
> > >
> >
> > It's not a specific kind of failure, it's literally "I didn't pin any
> > pages" which a caller may or may not choose to interpret as a failure.
>
> Any time gup fails it didn't pin any pages, that is the whole
> point. All that is happening is some ill defined subset of gup errors
> are returning 0 instead of an error code.
>
> If we want to enable callers to ignore certain errors then we need to
> return error codes with well defined meanings, have documentation what
> errors are included and actually make it sane.

Yeah I agree it's not exactly great that a failure to pin can be considered
an ordinary case, but I don't think a wrapper function is where we should
be trying to fix this.

In fact this patch isn't even fixing it, it's treating EIO as a success
case for the (possibly broken) uprobe case.

I think we are at the wrong level of abstraction here, basically.

>
> > That can be a reason for gup returning 0 but also if it you look at the
> > main loop in __get_user_pages_locked(), if it can't find the VMA it will
> > bail early, OR if the VMA flags are not as expected it'll bail early.
>
> And how does that make any sense? Missing VMA should be EFAULT.

Yeah missing VMA doesn't really make sense since we invoke the function
with the mmap lock held (it _could_ make sense if you were calling it via
one of the unlocked functions, speculatively, though how sensible doing
that is another discussion...)

>
> > caller differentiates between an error and not being able to pin -
> > uprobe_write_opcode() - which treats failure to pin as a non-error state.
>
> That looks like a bug since the function returns 0 on success but it
> clearly didn't succeed.

The code is specifically handling a failure-to-pin separately - set_swbp() ->
uprobe_write_opcode() -> install_breakpoint() explicitly does the following:-

ret = set_swbp(&uprobe->arch, mm, vaddr);
if (!ret)
clear_bit(MMF_RECALC_UPROBES, &mm->flags);

So even if this is... questionable, the code literally does want to
differentiate between an error and a failure to pin.

Presumably this is because of the flag check, but yeah we should be
differentiating between sub-cases.

>
> > Also if we decided at some point to return -EIO as an error suddenly we
> > would be treating an error state as not an error state in the proposed code
> > which sounds like a foot gun.
>
> No, this returning 0 on failure is a foot gun. Failing to pin a single
> page is always an error, the only question is what sub reason caused
> the error to happen. There is no third case where it is not an error.
>
> Jason

The uprobe path thinks otherwise, but maybe the answer is that we just need
to -EFAULT on missing VMA and -EPERM on invalid flags.

I could look into a patch that tries to undo this convention, and then we
could revisit this for the wrapper function too.