Re: Performance regressions in "boot_time" tests in Linux 5.8 Kernel

From: bhe@xxxxxxxxxx
Date: Sat Oct 10 2020 - 02:12:28 EST


On 10/09/20 at 01:15pm, Rahul Gopakumar wrote:
> As part of VMware's performance regression testing for Linux Kernel
> upstream releases, we identified boot time increase when comparing
> Linux 5.8 kernel against Linux 5.7 kernel. Increase in boot time is
> noticeable on VM with a **large amount of memory**.
>  
> In our test cases, it's noticeable with memory 1TB and more, whereas
> there was no major difference noticed in testcases with <1TB.
>  
> On bisecting between 5.7 and 5.8, we found the following commit from 
> “Baoquan He” to be the cause of boot time increase in big VM test cases.
>  
> -------------------------------------
>  
> commit 73a6e474cb376921a311786652782155eac2fdf0
> Author: Baoquan He <bhe@xxxxxxxxxx>
> Date: Wed Jun 3 15:57:55 2020 -0700
>  
> mm: memmap_init: iterate over memblock regions rather that check each PFN
>  
> When called during boot the memmap_init_zone() function checks if each PFN
> is valid and actually belongs to the node being initialized using
> early_pfn_valid() and early_pfn_in_nid().
>  
> Each such check may cost up to O(log(n)) where n is the number of memory
> banks, so for large amount of memory overall time spent in early_pfn*()
> becomes substantial.
>  
> -------------------------------------
>  
> For boot time test, we used RHEL 8.1 as the guest OS.
> VM config is 84 vcpu and 1TB vRAM.
>  
> Here are the actual performance numbers.
>  
> 5.7 GA - 18.17 secs
> Baoquan's commit - 21.6 secs (-16% increase in time)
>  
> From dmesg logs, we can see significant time delay around memmap.
>  
> Refer below logs.
>  
> Good commit
>  
> [0.033176] Normal zone: 1445888 pages used for memmap
> [0.033176] Normal zone: 89391104 pages, LIFO batch:63
> [0.035851] ACPI: PM-Timer IO Port: 0x448
>  
> Problem commit
>  
> [0.026874] Normal zone: 1445888 pages used for memmap
> [0.026875] Normal zone: 89391104 pages, LIFO batch:63
> [2.028450] ACPI: PM-Timer IO Port: 0x448

Could you add memblock=debug to kernel cmdline and paste the boot logs of
system w and w/o the commit?

>  
> We did some analysis, and it looks like with the problem commit it's
> not deferring the memory initialization to a later stage and it's
> initializing the huge chunk of memory in serial - during the boot-up
> time.  Whereas with the good commit, it was able to defer the
> initialization of the memory when it could be done in parallel.
>
>
> Rahul Gopakumar
> Performance Engineering
> VMware, Inc.
>