Re: [patch for-5.3 0/4] revert immediate fallback to remote hugepages

From: Vlastimil Babka
Date: Wed Oct 23 2019 - 07:03:42 EST


On 10/18/19 4:15 PM, Michal Hocko wrote:
> It's been some time since I've posted these results. The hugetlb issue
> got resolved but I would still like to hear back about these findings
> because they suggest that the current bail out strategy doesn't seem
> to produce very good results. Essentially it doesn't really help THP
> locality (on moderately filled up nodes) and it introduces a strong
> dependency on kswapd which is not a source of the high order pages.
> Also the overral THP success rate is decreased on a pretty standard "RAM
> is used for page cache" workload.
>
> That makes me think that the only possible workload that might really
> benefit from this heuristic is a THP demanding one on a heavily
> fragmented node with a lot of free memory while other nodes are not
> fragmented and have quite a lot of free memory. If that is the case, is
> this something to optimize for?
>
> I am keeping all the results for the reference in a condensed form
>
> On Tue 01-10-19 10:37:43, Michal Hocko wrote:
>> I have split out my kvm machine into two nodes to get at least some
>> idea how these patches behave
>> $ numactl -H
>> available: 2 nodes (0-1)
>> node 0 cpus: 0 2
>> node 0 size: 475 MB
>> node 0 free: 432 MB
>> node 1 cpus: 1 3
>> node 1 size: 503 MB
>> node 1 free: 458 MB
>>
>> First run with 5.3 and without THP
>> $ echo never > /sys/kernel/mm/transparent_hugepage/enabled
>> root@test1:~# sh thp_test.sh
>> 7f4bdefec000 prefer:1 anon=102400 dirty=102400 active=86115 N0=41963 N1=60437 kernelpagesize_kB=4
>> 7fd0f248b000 prefer:1 anon=102400 dirty=102400 active=86909 N0=40079 N1=62321 kernelpagesize_kB=4
>> 7f2a69fc3000 prefer:1 anon=102400 dirty=102400 active=85244 N0=44455 N1=57945 kernelpagesize_kB=4
>>
>> So we get around 56-60% pages to the preferred node
>>
>> Now let's enable THPs
>> AnonHugePages: 407552 kB
>> 7f05c6dee000 prefer:1 anon=102400 dirty=102400 active=52718 N0=50688 N1=51712 kernelpagesize_kB=4
>> Few more runs
>> AnonHugePages: 407552 kB
>> 7effca1b9000 prefer:1 anon=102400 dirty=102400 active=65977 N0=53760 N1=48640 kernelpagesize_kB=4
>> AnonHugePages: 407552 kB
>> 7f474bfc4000 prefer:1 anon=102400 dirty=102400 active=52676 N0=8704 N1=93696 kernelpagesize_kB=4
>>
>> The utilization is again almost 100% and the preferred node usage
>> varied a lot between 47-91%.
>>
>> Now with 5.3 + all 4 patches this time:
>> AnonHugePages: 401408 kB
>> 7f8114ab4000 prefer:1 anon=102400 dirty=102400 active=51892 N0=3072 N1=99328 kernelpagesize_kB=4
>> AnonHugePages: 376832 kB
>> 7f37a1404000 prefer:1 anon=102400 dirty=102400 active=55204 N0=23153 N1=79247 kernelpagesize_kB=4
>> AnonHugePages: 372736 kB
>> 7f4abe4af000 prefer:1 anon=102400 dirty=102400 active=52399 N0=23646 N1=78754 kernelpagesize_kB=4
>>
>> The THP utilization varies again and the locality is higher in average
>> 76+%. Which is even higher than the base page case. I was really

I tried to reproduce your setup locally, and got this for THP case
on 5.4-rc4:

AnonHugePages: 395264 kB
7fdc4a2c0000 prefer:1 anon=102400 dirty=102400 N0=48852 N1=53548 kernelpagesize_kB=4
AnonHugePages: 401408 kB
7f27167e2000 prefer:1 anon=102400 dirty=102400 N0=40095 N1=62305 kernelpagesize_kB=4
AnonHugePages: 378880 kB
7ff693ff9000 prefer:1 anon=102400 dirty=102400 N0=58061 N1=44339 kernelpagesize_kB=4

Somewhat better THP utilization and worse node locality than you.

Then I applied a rebased patch that I proposed before (see below):

AnonHugePages: 407552 kB
7f33fa83a000 prefer:1 anon=102400 dirty=102400 N0=28672 N1=73728 kernelpagesize_kB=4
AnonHugePages: 407552 kB
7faac0aa9000 prefer:1 anon=102400 dirty=102400 N0=48869 N1=53531 kernelpagesize_kB=4
AnonHugePages: 407552 kB
7f9f32c57000 prefer:1 anon=102400 dirty=102400 N0=49664 N1=52736 kernelpagesize_kB=4

The THP utilization is now back at 100% as 5.3 (modulo mis-alignment of
the mem_eater area). This is expected, as the second try that's not limited
to __GFP_THISNODE is also not limited by the newly introduced (in 5.4) heuristics
that checks COMPACT_SKIPPED. Locality seems similar, can't make any
conclusions with such variation and so few tries.
Could you try confirming that as well? Thanks. But I agree the test is
limited and probably depends on timing wrt kswapd making progress.

----8<----