Re: [PATCHv6 32/36] thp: reintroduce split_huge_page()

From: Vlastimil Babka
Date: Wed Jun 10 2015 - 11:44:50 EST

Next message: George Spelvin: "Re: Discussion: quick_pit_calibrate is slow"
Previous message: Jiri Slaby: "[PATCH 3.12 094/111] net_sched: invoke ->attach() after setting dev->qdisc"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 06/03/2015 07:06 PM, Kirill A. Shutemov wrote:

This patch adds implementation of split_huge_page() for new
refcountings.

Unlike previous implementation, new split_huge_page() can fail if
somebody holds GUP pin on the page. It also means that pin on page
would prevent it from bening split under you. It makes situation in
many places much cleaner.

The basic scheme of split_huge_page():

- Check that sum of mapcounts of all subpage is equal to page_count()
plus one (caller pin). Foll off with -EBUSY. This way we can avoid
useless PMD-splits.

- Freeze the page counters by splitting all PMD and setup migration
PTEs.

- Re-check sum of mapcounts against page_count(). Page's counts are
stable now. -EBUSY if page is pinned.

- Split compound page.

- Unfreeze the page by removing migration entries.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx>
Tested-by: Sasha Levin <sasha.levin@xxxxxxxxxx>

[...]

+
+static int __split_huge_page_tail(struct page *head, int tail,
+ struct lruvec *lruvec, struct list_head *list)
+{
+ int mapcount;
+ struct page *page_tail = head + tail;
+
+ mapcount = page_mapcount(page_tail);

Isn't page_mapcount() unnecessarily heavyweight here? When you are splitting a page, it already should have zero compound_mapcount() and shouldn't be PageDoubleMap(), no? So you should care about page->_mapcount only? Sure, splitting THP is not a hotpath, but when done 512 times per split, it could make some difference in the split's latency.

+ VM_BUG_ON_PAGE(atomic_read(&page_tail->_count) != 0, page_tail);
+
+ /*
+ * tail_page->_count is zero and not changing from under us. But
+ * get_page_unless_zero() may be running from under us on the
+ * tail_page. If we used atomic_set() below instead of atomic_add(), we
+ * would then run atomic_set() concurrently with
+ * get_page_unless_zero(), and atomic_set() is implemented in C not
+ * using locked ops. spin_unlock on x86 sometime uses locked ops
+ * because of PPro errata 66, 92, so unless somebody can guarantee
+ * atomic_set() here would be safe on all archs (and not only on x86),
+ * it's safer to use atomic_add().

I would be surprised if this was the first place to use atomic_set() with potential concurrent atomic_add(). Shouldn't atomic_*() API guarantee that this works?

+ */
+ atomic_add(page_mapcount(page_tail) + 1, &page_tail->_count);

You already have the value in mapcount variable, so why read it again.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: George Spelvin: "Re: Discussion: quick_pit_calibrate is slow"
Previous message: Jiri Slaby: "[PATCH 3.12 094/111] net_sched: invoke ->attach() after setting dev->qdisc"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]