Re: Weird code with change "mm/gup: clean up follow_pfn_pte() slightly"

From: John Hubbard
Date: Thu Feb 03 2022 - 20:22:45 EST


On 2/3/22 17:06, Jason Gunthorpe wrote:
On Thu, Feb 03, 2022 at 04:59:56PM -0800, John Hubbard wrote:
On 2/3/22 16:45, Jason Gunthorpe wrote:
On Thu, Feb 03, 2022 at 12:44:57PM -0800, John Hubbard wrote:
On 2/3/22 05:01, Jason Gunthorpe wrote:
...
In the new branch if (pages), you set page = ERR_PTR(-EFAULT) and goto
out. However, at the label out, the value of page is not used, but the
return uses the variables i and ret.

Yes, I think that the complaint is accurate. The intent of this code is
to return either number of pages so far (i) or ret (which should be zero
in this case), because we are just stopping early, rather than calling
this an actual error.

IIRC GUP shouldn't return 0, it should return an error code, not zero.

Jason

Errors work for single pages, but GUP is a multi-page API call. If it
returned an error part way through the list of pages, then callers would
have no way of knowing how many pages to release.

Yes, but that is returning a positive error code, I said it should not
return zero.

When it hits an error with pages already loaded it returns that number
and the caller will then do gup once more with the VA pointing at the
problematic page. Then GUP can return the error code because it has 0
pages on the next iteration.

It should not return 0 here when it got an error.

This is perhaps better API design, but it's not what exists now.

I think it is what exists today, 0 certainly is not implemented as
'need retry' anywhere I found.

So why do we return 0, if it means an error, instead of returning the
actual errno?

Well, now returning 0 sounds all wrong, when you put it like that. :)

So, simply this approach? :

@@ -1205,8 +1201,15 @@ static long __get_user_pages(struct mm_struct *mm,
} else if (PTR_ERR(page) == -EEXIST) {
/*
* Proper page table entry exists, but no corresponding
- * struct page.
+ * struct page. If the caller expects **pages to be
+ * filled in, bail out now, because that can't be done
+ * for this page.
*/
+ if (pages) {
+ ret = PTR_ERR(page);
+ goto out;
+ }
+
goto next_page;
} else if (IS_ERR(page)) {
ret = PTR_ERR(page);


The call sites today handle 0 pages ret value correctly,

This isn't correct though:

if (get_user_pages(vaddr, 1, write ? FOLL_WRITE : 0, &page, NULL) <= 0)
return -EFAULT;

If GUP wanted the caller to permanently fail with -EFAULT, it should
have directly returned EFAULT.

0 means 'to be retried', whatever that means, and there is no retry
in the above.

IOW, the above does not handle a 0 return correctly, according to the
comment.


I recall seeing several sites that do a quick attempt at one page and
force a -errno failure if anything other than ret==1 occurs. I guess the
good news is that changing GUP to return -errno instead of 0 won't affect
them.


thanks,
--
John Hubbard
NVIDIA