Re: [PATCH v3 2/4] mm/gup: clean up follow_pfn_pte() slightly

From: John Hubbard
Date: Thu Feb 03 2022 - 15:53:28 EST


On 2/3/22 05:31, Claudio Imbrenda wrote:
...
@@ -1180,8 +1176,13 @@ static long __get_user_pages(struct mm_struct *mm,
} else if (PTR_ERR(page) == -EEXIST) {
/*
* Proper page table entry exists, but no corresponding
- * struct page.
+ * struct page. If the caller expects **pages to be
+ * filled in, bail out now, because that can't be done
+ * for this page.
*/
+ if (pages)
+ goto out;
+
goto next_page;
} else if (IS_ERR(page)) {
ret = PTR_ERR(page);

I'm not an expert, can you explain why this is better, and why it does
not cause new issues?

If I understand correctly, the problem you are trying to solve is that
in some cases you might try to get n pages, but you only get m < n
pages instead, because some don't have an associated struct page, and
the missing pages might even be in the middle.

The `pages` array would contain the list of pages actually pinned
(getted?), but this won't tell which of the requested pages have been
pinned (e.g. if some pages in the middle of the run were skipped)


The get_user_pages() API doesn't leave pages in the middle, ever.
Instead, it stops at the first error, and reports the number of page
that were successfully pinned. And the caller is responsible for
unpinning.

From __get_user_pages()'s kerneldoc documentation:

* Returns either number of pages pinned (which may be less than the
* number requested), or an error. Details about the return value:
*
* -- If nr_pages is 0, returns 0.
* -- If nr_pages is >0, but no pages were pinned, returns -errno.
* -- If nr_pages is >0, and some pages were pinned, returns the number of
* pages pinned. Again, this may be less than nr_pages.
* -- 0 return value is possible when the fault would need to be retried.
*
* The caller is responsible for releasing returned @pages, via put_page().

So the **pages array doesn't have holes, and the caller just counts up
from the beginning of **pages and stops at nr_pages.


With your patch you will stop at the first page without a struct page,
meaning that if the caller tries again, it will get 0 pages. Why won't
this cause issues?

Callers are already written to deal with this case.


Why will this not cause problems when the `pages` parameter is NULL?

The behavior is unchanged here if pages == NULL. But maybe you meant,
if pages != NULL. And in that case, the new behavior is to stop early
and return n < m, which is (I am claiming) better than just leaving
garbage values in **pages.

Another approach would be to file in PTR_ERR(page) values, but GUP is
a well-established and widely used API, and that would be a large
change that would require changing a lot of caller code.



sorry for the dumb questions, but this seems a rather important change,
and I think in these circumstances you can't have too much
documentation.


Thanks for reviewing this!


thanks,
--
John Hubbard
NVIDIA