Another thought suggested by Bang Li is that we record which pte is none
in hpage_collapse_scan_pmd. Then, before acquiring the mmap_lock (write mode),
we will pre-zero pages as needed.
So my question is: do we really care about it that much that we care to
optimize?
IMO, although it may not be our main concern, reducing the impact of
khugepaged on the process remains crucial. I think that users also prefer
minimal interference from khugepaged.
The problem I am having with this is that for the *common* case where we
have a small number of pte_none(), we cannot really optimize because we
have to perform the copy.
So this feels like we're rather optimizing a corner case, and I am not
so sure if that is really worth it.
Other thoughts?
Another thought is to introduce khugepaged/alloc_zeroed_hpage for THP
sysfs settings. This would enable users to decide whether to avoid unnecessary
copies when nr_ptes_none > 0.