Re: [PATCH v3 0/6] mm / virtio: Provide support for unused page reporting

From: Nitesh Narayan Lal
Date: Fri Aug 02 2019 - 10:41:50 EST



On 8/1/19 6:24 PM, Alexander Duyck wrote:
> This series provides an asynchronous means of reporting to a hypervisor
> that a guest page is no longer in use and can have the data associated
> with it dropped. To do this I have implemented functionality that allows
> for what I am referring to as unused page reporting
>
> The functionality for this is fairly simple. When enabled it will allocate
> statistics to track the number of reported pages in a given free area.
> When the number of free pages exceeds this value plus a high water value,
> currently 32, it will begin performing page reporting which consists of
> pulling pages off of free list and placing them into a scatter list. The
> scatterlist is then given to the page reporting device and it will perform
> the required action to make the pages "reported", in the case of
> virtio-balloon this results in the pages being madvised as MADV_DONTNEED
> and as such they are forced out of the guest. After this they are placed
> back on the free list, and an additional bit is added if they are not
> merged indicating that they are a reported buddy page instead of a
> standard buddy page. The cycle then repeats with additional non-reported
> pages being pulled until the free areas all consist of reported pages.
>
> I am leaving a number of things hard-coded such as limiting the lowest
> order processed to PAGEBLOCK_ORDER, and have left it up to the guest to
> determine what the limit is on how many pages it wants to allocate to
> process the hints. The upper limit for this is based on the size of the
> queue used to store the scatterlist.
>
> My primary testing has just been to verify the memory is being freed after
> allocation by running memhog 40g on a 40g guest and watching the total
> free memory via /proc/meminfo on the host. With this I have verified most
> of the memory is freed after each iteration. As far as performance I have
> been mainly focusing on the will-it-scale/page_fault1 test running with
> 16 vcpus. With that I have seen up to a 2% difference between the base
> kernel without these patches and the patches with virtio-balloon enabled
> or disabled.

A couple of questions:

- The 2% difference which you have mentioned, is this visible for
 all the 16 cores or just the 16th core?
- I am assuming that the difference is seen for both "number of process"
 and "number of threads" launched by page_fault1. Is that right?

>
> One side effect of these patches is that the guest becomes much more
> resilient in terms of NUMA locality. With the pages being freed and then
> reallocated when used it allows for the pages to be much closer to the
> active thread, and as a result there can be situations where this patch
> set will out-perform the stock kernel when the guest memory is not local
> to the guest vCPUs.


Was this the reason because of which you were seeing better results for
page_fault1 earlier?

>
> Patch 4 is a bit on the large side at about 600 lines of change, however
> I really didn't see a good way to break it up since each piece feeds into
> the next. So I couldn't add the statistics by themselves as it didn't
> really make sense to add them without something that will either read or
> increment/decrement them, or add the Hinted state without something that
> would set/unset it. As such I just ended up adding the entire thing as
> one patch. It makes it a bit bigger but avoids the issues in the previous
> set where I was referencing things that had not yet been added.
>
> Changes from the RFC:
> https://lore.kernel.org/lkml/20190530215223.13974.22445.stgit@xxxxxxxxxxxxxxxxxxxxx/
> Moved aeration requested flag out of aerator and into zone->flags.
> Moved boundary out of free_area and into local variables for aeration.
> Moved aeration cycle out of interrupt and into workqueue.
> Left nr_free as total pages instead of splitting it between raw and aerated.
> Combined size and physical address values in virtio ring into one 64b value.
>
> Changes from v1:
> https://lore.kernel.org/lkml/20190619222922.1231.27432.stgit@xxxxxxxxxxxxxxxxxxxxx/
> Dropped "waste page treatment" in favor of "page hinting"
> Renamed files and functions from "aeration" to "page_hinting"
> Moved from page->lru list to scatterlist
> Replaced wait on refcnt in shutdown with RCU and cancel_delayed_work_sync
> Virtio now uses scatterlist directly instead of intermediate array
> Moved stats out of free_area, now in separate area and pointed to from zone
> Merged patch 5 into patch 4 to improve review-ability
> Updated various code comments throughout
>
> Changes from v2:
> https://lore.kernel.org/lkml/20190724165158.6685.87228.stgit@xxxxxxxxxxxxxxxxxxxxx/
> Dropped "page hinting" in favor of "page reporting"
> Renamed files from "hinting" to "reporting"
> Replaced "Hinted" page type with "Reported" page flag
> Added support for page poisoning while hinting is active
> Add QEMU patch that implements PAGE_POISON feature
>
> ---
>
> Alexander Duyck (6):
> mm: Adjust shuffle code to allow for future coalescing
> mm: Move set/get_pcppage_migratetype to mmzone.h
> mm: Use zone and order instead of free area in free_list manipulators
> mm: Introduce Reported pages
> virtio-balloon: Pull page poisoning config out of free page hinting
> virtio-balloon: Add support for providing unused page reports to host
>
>
> drivers/virtio/Kconfig | 1
> drivers/virtio/virtio_balloon.c | 75 ++++++++-
> include/linux/mmzone.h | 116 ++++++++------
> include/linux/page-flags.h | 11 +
> include/linux/page_reporting.h | 138 ++++++++++++++++
> include/uapi/linux/virtio_balloon.h | 1
> mm/Kconfig | 5 +
> mm/Makefile | 1
> mm/internal.h | 18 ++
> mm/memory_hotplug.c | 1
> mm/page_alloc.c | 238 ++++++++++++++++++++--------
> mm/page_reporting.c | 299 +++++++++++++++++++++++++++++++++++
> mm/shuffle.c | 24 ---
> mm/shuffle.h | 32 ++++
> 14 files changed, 821 insertions(+), 139 deletions(-)
> create mode 100644 include/linux/page_reporting.h
> create mode 100644 mm/page_reporting.c
>
> --
>
--
Thanks
Nitesh