[RFC v2 PATCH 0/4] speed up page allocation for __GFP_ZERO

From: Liang Li
Date: Mon Dec 21 2020 - 13:26:59 EST


The first version can be found at: https://lkml.org/lkml/2020/4/12/42

Zero out the page content usually happens when allocating pages with
the flag of __GFP_ZERO, this is a time consuming operation, it makes
the population of a large vma area very slowly. This patch introduce
a new feature for zero out pages before page allocation, it can help
to speed up page allocation with __GFP_ZERO.

My original intention for adding this feature is to shorten VM
creation time when SR-IOV devicde is attached, it works good and the
VM creation time is reduced by about 90%.

Creating a VM [64G RAM, 32 CPUs] with GPU passthrough
=====================================================
QEMU use 4K pages, THP is off
round1 round2 round3
w/o this patch: 23.5s 24.7s 24.6s
w/ this patch: 10.2s 10.3s 11.2s

QEMU use 4K pages, THP is on
round1 round2 round3
w/o this patch: 17.9s 14.8s 14.9s
w/ this patch: 1.9s 1.8s 1.9s
=====================================================

Obviously, it can do more than this. We can benefit from this feature
in the flowing case:

Interactive sence
=================
Shorten application lunch time on desktop or mobile phone, it can help
to improve the user experience. Test shows on a
server [Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz], zero out 1GB RAM by
the kernel will take about 200ms, while some mainly used application
like Firefox browser, Office will consume 100 ~ 300 MB RAM just after
launch, by pre zero out free pages, it means the application launch
time will be reduced about 20~60ms (can be visual sensed?). May be
we can make use of this feature to speed up the launch of Andorid APP
(I didn't do any test for Android).

Virtulization
=============
Speed up VM creation and shorten guest boot time, especially for PCI
SR-IOV device passthrough scenario. Compared with some of the para
vitalization solutions, it is easy to deploy because it’s transparent
to guest and can handle DMA properly in BIOS stage, while the para
virtualization solution can’t handle it well.

Improve guest performance when use VIRTIO_BALLOON_F_REPORTING for memory
overcommit. The VIRTIO_BALLOON_F_REPORTING feature will report guest page
to the VMM, VMM will unmap the corresponding host page for reclaim,
when guest allocate a page just reclaimed, host will allocate a new page
and zero it out for guest, in this case pre zero out free page will help
to speed up the proccess of fault in and reduce the performance impaction.

Speed up kernel routine
=======================
This can’t be guaranteed because we don’t pre zero out all the free pages,
but is true for most case. It can help to speed up some important system
call just like fork, which will allocate zero pages for building page
table. And speed up the process of page fault, especially for huge page
fault. The POC of Hugetlb free page pre zero out has been done.

Security
========
This is a weak version of "introduce init_on_alloc=1 and init_on_free=1
boot options", which zero out page in a asynchronous way. For users can't
tolerate the impaction of 'init_on_alloc=1' or 'init_on_free=1' brings,
this feauture provide another choice.

For the feedback of the first version, cache pollution is the main concern
of the mm guys, On the other hand, this feature is really helpful for
some use case. May be we should let the user decide wether to use it.
So a switch is added in the /sys files, users who don’t like it can turn
off the switch, or by configuring a large batch size to reduce cache
pollution.

To make the whole function works, support of pre zero out free huge pages
should be added for hugetlbfs, I will send another patch for it.

Liang Li (4):
mm: let user decide page reporting option
mm: pre zero out free pages to speed up page allocation for __GFP_ZERO
mm: make page reporing worker works better for low order page
mm: Add batch size for free page reporting

drivers/virtio/virtio_balloon.c | 3 +
include/linux/highmem.h | 31 +++-
include/linux/page-flags.h | 16 +-
include/linux/page_reporting.h | 3 +
include/trace/events/mmflags.h | 7 +
mm/Kconfig | 10 ++
mm/Makefile | 1 +
mm/huge_memory.c | 3 +-
mm/page_alloc.c | 4 +
mm/page_prezero.c | 266 ++++++++++++++++++++++++++++++++
mm/page_prezero.h | 13 ++
mm/page_reporting.c | 49 +++++-
mm/page_reporting.h | 16 +-
13 files changed, 405 insertions(+), 17 deletions(-)
create mode 100644 mm/page_prezero.c
create mode 100644 mm/page_prezero.h

Cc: Alexander Duyck <alexander.h.duyck@xxxxxxxxxxxxxxx>
Cc: Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx>
Cc: Andrea Arcangeli <aarcange@xxxxxxxxxx>
Cc: Dan Williams <dan.j.williams@xxxxxxxxx>
Cc: Dave Hansen <dave.hansen@xxxxxxxxx>
Cc: David Hildenbrand <david@xxxxxxxxxx>
Cc: Michal Hocko <mhocko@xxxxxxxx>
Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
Cc: Alex Williamson <alex.williamson@xxxxxxxxxx>
Cc: Michael S. Tsirkin <mst@xxxxxxxxxx>
Signed-off-by: Liang Li <liliang324@xxxxxxxxx>
--
2.18.2