Re: [PATCH v1 00/14] Transparent Contiguous PTEs for User Mappings

From: Ryan Roberts
Date: Mon Jul 10 2023 - 09:28:25 EST


On 10/07/2023 13:05, Barry Song wrote:
> On Thu, Jun 22, 2023 at 11:00 PM Ryan Roberts <ryan.roberts@xxxxxxx> wrote:
>>
>> Hi All,
>>
[...]
>>
>> Performance
>> -----------
>>
>> Below results show 2 benchmarks; kernel compilation and speedometer 2.0 (a
>> javascript benchmark running in Chromium). Both cases are running on Ampere
>> Altra with 1 NUMA node enabled, Ubuntu 22.04 and XFS filesystem. Each benchmark
>> is repeated 15 times over 5 reboots and averaged.
>>
>> All improvements are relative to baseline-4k. anonfolio and exefolio are as
>> described above. contpte is this series. (Note that exefolio only gives an
>> improvement because contpte is already in place).
>>
>> Kernel Compilation (smaller is better):
>>
>> | kernel | real-time | kern-time | user-time |
>> |:-------------|------------:|------------:|------------:|
>> | baseline-4k | 0.0% | 0.0% | 0.0% |
>> | anonfolio | -5.4% | -46.0% | -0.3% |
>> | contpte | -6.8% | -45.7% | -2.1% |
>> | exefolio | -8.4% | -46.4% | -3.7% |
>
> sorry i am a bit confused. in exefolio case, is anonfolio included?
> or it only has large cont-pte folios on exe code? in the other words,
> Does the 8.4% improvement come from iTLB miss reduction only,
> or from both dTLB and iTLB miss reduction?

The anonfolio -> contpte -> exefolio results are incremental. So:

anonfolio: baseline-4k + anonfolio changes
contpte: anonfolio + contpte changes
exefolio: contpte + exefolio changes

So yes, exefolio includes anonfolio. Sorry for the confusion.

>
>> | baseline-16k | -8.7% | -49.2% | -3.7% |
>> | baseline-64k | -10.5% | -66.0% | -3.5% |
>>
>> Speedometer 2.0 (bigger is better):
>>
>> | kernel | runs_per_min |
>> |:-------------|---------------:|
>> | baseline-4k | 0.0% |
>> | anonfolio | 1.2% |
>> | contpte | 3.1% |
>> | exefolio | 4.2% |
>
> same question as above.

same answer as above.

Thanks,
Ryan


>
>> | baseline-16k | 5.3% |
>>