Re: [RESEND PATCH v7 00/10] Small-sized THP for anonymous memory

From: John Hubbard
Date: Thu Nov 23 2023 - 01:28:33 EST


On 11/22/23 08:29, Ryan Roberts wrote:
...
Prerequisites
=============

Some work items identified as being prerequisites are listed on page 3 at [8].
The summary is:

| item | status |
|:------------------------------|:------------------------|
| mlock | In mainline (v6.7) |
| madvise | In mainline (v6.6) |
| compaction | v1 posted [9] |
| numa balancing | Investigated: see below |
| user-triggered page migration | In mainline (v6.7) |
| khugepaged collapse | In mainline (NOP) |

On NUMA balancing, which currently ignores any PTE-mapped THPs it encounters,
John Hubbard has investigated this and concluded that it is A) not clear at the
moment what a better policy might be for PTE-mapped THP and B) questions whether
this should really be considered a prerequisite given no regression is caused
for the default "small-sized THP disabled" case, and there is no correctness
issue when it is enabled - its just a potential for non-optimal performance.
(John please do elaborate if I haven't captured this correctly!)

That's accurate. I actually want to continue looking into this (Mel
Gorman's recent replies to v6 provided helpful touchstones to the NUMA
reasoning leading up to the present day), and maybe at least bring
pte-thps into rough parity with THPs with respect to NUMA.

But that really doesn't seem like something that needs to happen first,
especially since the outcome might even be, "first, do no harm"--as in,
it's better as-is. We'll see.


If there are no disagreements about removing numa balancing from the list, then
that just leaves compaction which is in review on list at the moment.

I really would like to get this series (and its remaining comapction
prerequisite) in for v6.8. I accept that it may be a bit optimistic at this
point, but lets see where we get to with review?


Testing
=======

The series includes patches for mm selftests to enlighten the cow and khugepaged
tests to explicitly test with small-order THP, in the same way that PMD-order
THP is tested. The new tests all pass, and no regressions are observed in the mm
selftest suite. I've also run my usual kernel compilation and java script
benchmarks without any issues.

Refer to my performance numbers posted with v6 [6]. (These are for small-sized
THP only - they do not include the arm64 contpte follow-on series).

John Hubbard at Nvidia has indicated dramatic 10x performance improvements for
some workloads at [10]. (Observed using v6 of this series as well as the arm64
contpte series).


Testing continues. Some workloads do even much better than than 10x,
it's quite remarkable and glorious to see. :) I can send more perf data
perhaps in a few days or a week, if there is still doubt about the
benefits.

That was with the v6 series, though. I'm about to set up and run with
v7, and expect to provide a tested by tag for functionality, sometime
soon (in the next few days), if machine availability works out as
expected.


thanks,
--
John Hubbard
NVIDIA