Re: [PATCH 1/1] mm/khugepaged: reduce process visible downtime by pre-zeroing hugepage

From: David Hildenbrand
Date: Fri Mar 15 2024 - 08:18:37 EST


On 14.03.24 15:19, Lance Yang wrote:
Another thought suggested by Bang Li is that we record which pte is none
in hpage_collapse_scan_pmd. Then, before acquiring the mmap_lock (write mode),
we will pre-zero pages as needed.

Here is my point of view: we cannot optimize the common case where we have mostly !pte_none() in a similar way.

So why do we care about the less common case? Is the process visible downtime reduction for that less common case really noticable?

Or is it rather something that looks good in a micro-benchmark, but won't really make any difference in actual applications (again, where the common case will still result the same downtime).

I'm not against this, I'm rather wonder "do we really care". I'd like to hear other opinions.


So my question is: do we really care about it that much that we care to
optimize?

IMO, although it may not be our main concern, reducing the impact of
khugepaged on the process remains crucial. I think that users also prefer
minimal interference from khugepaged.

The problem I am having with this is that for the *common* case where we
have a small number of pte_none(), we cannot really optimize because we
have to perform the copy.

So this feels like we're rather optimizing a corner case, and I am not
so sure if that is really worth it.

Other thoughts?

Another thought is to introduce khugepaged/alloc_zeroed_hpage for THP
sysfs settings. This would enable users to decide whether to avoid unnecessary
copies when nr_ptes_none > 0.

Hm, new toggles for that, not sure ... I much rather prefer something without any new toggles, especially for this case.

--
Cheers,

David / dhildenb