Re: [RFC PATCH 0/4] mm: Add PG_zero support

From: David Hildenbrand
Date: Tue Apr 14 2020 - 08:02:14 EST


On 12.04.20 11:07, liliangleo wrote:
> Zero out the page content usually happens when allocating pages,
> this is a time consuming operation, it makes pin and mlock
> operation very slowly, especially for a large batch of memory.
>
> This patch introduce a new feature for zero out pages before page
> allocation, it can help to speed up page allocation.
>
> The idea is very simple, zero out free pages when the system is
> not busy and mark the page with PG_zero, when allocating a page,
> if the page need to be filled with zero, check the flag in the
> struct page, if it's marked as PG_zero, zero out can be skipped,
> it can save cpu time and speed up page allocation.
>
> This serial is based on the feature 'free page reporting' which
> introduced by Alexander Duyck
>
> We can benefit from this feature in the flowing case:
> 1. User space mlock a large chunk of memory
> 2. VFIO pin pages for DMA
> 3. Allocating transparent huge page
> 4. Speed up page fault process
>
> My original intention for adding this feature is to shorten
> VM creation time when VFIO device is attached, it works good
> and the VM creation time is reduced obviously.
>
> Creating a VM [64G RAM, 32 CPUs] with GPU passthrough
> =====================================================
> QEMU use 4K pages, THP is off
> round1 round2 round3
> w/o this patch: 23.5s 24.7s 24.6s
> w/ this patch: 10.2s 10.3s 11.2s
>
> QEMU use 4K pages, THP is on
> round1 round2 round3
> w/o this patch: 17.9s 14.8s 14.9s
> w/ this patch: 1.9s 1.8s 1.9s
> =====================================================
>
> Look forward to your feedbacks.

I somehow have the feeling that this should not be glued to free page
reporting. After all, you are proposing your own status indicator for
each buddy page (PG_zero) already, which would mean you can build
something similar to free page reporting fairly easily, and have it
co-exist.

The free page reporting infrastructure is helpful when wanting to
asynchronously batch-process higher-order pages. I don't see the
immediate need for the "batch-processing here".

E.g., why not simply zero out pages as they are freed/placed into free
lists? Especially, this is one of the simple alternatives to free page
reporting as we have it today (guest zeroes free pages, hypervisor
detects free pages using e.g., ksm).

That could even allow you to avoid the PG_zero flag completely. E.g.,
once the feature is activated and running, all pages in the buddy free
lists are zeroed out already. Zeroing happens synchronously from the
page-freeing thread, not when starting a guest.

Having that said, I agree with Dave here, that there might be better
alternatives for this somewhat-special-case.

--
Thanks,

David / dhildenb