Re: Performance regressions in "boot_time" tests in Linux 5.8 Kernel

From: Rahul Gopakumar
Date: Fri Dec 11 2020 - 11:24:09 EST


Hi Baoquan,

We re-evaluated your last patch and it seems to be fixing the
initial performance bug reported. During our previous testing,
we did not apply the patch rightly hence it was reporting
some issues.

Here is the dmesg log confirming no delay in the draft patch.

Vanilla (5.10 rc3)
------------------

[ 0.024011] On node 2 totalpages: 89391104
[ 0.024012] Normal zone: 1445888 pages used for memmap
[ 0.024012] Normal zone: 89391104 pages, LIFO batch:63
[ 2.054646] ACPI: PM-Timer IO Port: 0x448 --------------> 2 secs delay

Patch
------

[ 0.024166] On node 2 totalpages: 89391104
[ 0.024167] Normal zone: 1445888 pages used for memmap
[ 0.024167] Normal zone: 89391104 pages, LIFO batch:63
[ 0.026694] ACPI: PM-Timer IO Port: 0x448 --------------> No delay

Attached dmesg logs. Let me know if anything is needed from our end.



From: Rahul Gopakumar <gopakumarr@xxxxxxxxxx>
Sent: 24 November 2020 8:33 PM
To: bhe@xxxxxxxxxx <bhe@xxxxxxxxxx>
Cc: linux-mm@xxxxxxxxx <linux-mm@xxxxxxxxx>; linux-kernel@xxxxxxxxxxxxxxx <linux-kernel@xxxxxxxxxxxxxxx>; akpm@xxxxxxxxxxxxxxxxxxxx <akpm@xxxxxxxxxxxxxxxxxxxx>; natechancellor@xxxxxxxxx <natechancellor@xxxxxxxxx>; ndesaulniers@xxxxxxxxxx <ndesaulniers@xxxxxxxxxx>; clang-built-linux@xxxxxxxxxxxxxxxx <clang-built-linux@xxxxxxxxxxxxxxxx>; rostedt@xxxxxxxxxxx <rostedt@xxxxxxxxxxx>; Rajender M <manir@xxxxxxxxxx>; Yiu Cho Lau <lauyiuch@xxxxxxxxxx>; Peter Jonasson <pjonasson@xxxxxxxxxx>; Venkatesh Rajaram <rajaramv@xxxxxxxxxx>
Subject: Re: Performance regressions in "boot_time" tests in Linux 5.8 Kernel
 
Hi Baoquan,

We applied the new patch to 5.10 rc3 and tested it. We are still
observing the same page corruption issue which we saw with the
old patch. This is causing 3 secs delay in boot time.

Attached dmesg log from the new patch and also from vanilla
5.10 rc3 kernel.

There are multiple lines like below in the dmesg log of the
new patch.

"BUG: Bad page state in process swapper  pfn:ab08001"

________________________________________
From: bhe@xxxxxxxxxx <bhe@xxxxxxxxxx>
Sent: 22 November 2020 6:38 AM
To: Rahul Gopakumar
Cc: linux-mm@xxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; akpm@xxxxxxxxxxxxxxxxxxxx; natechancellor@xxxxxxxxx; ndesaulniers@xxxxxxxxxx; clang-built-linux@xxxxxxxxxxxxxxxx; rostedt@xxxxxxxxxxx; Rajender M; Yiu Cho Lau; Peter Jonasson; Venkatesh Rajaram
Subject: Re: Performance regressions in "boot_time" tests in Linux 5.8 Kernel

On 11/20/20 at 03:11am, Rahul Gopakumar wrote:
> Hi Baoquan,
>
> To which commit should we apply the draft patch. We tried applying
> the patch to the commit 3e4fb4346c781068610d03c12b16c0cfb0fd24a3
> (the one we used for applying the previous patch) but it fails.

I tested on 5.10-rc3+. You can append below change to the old patch in
your testing kernel.

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index fa6076e1a840..5e5b74e88d69 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -448,6 +448,8 @@ defer_init(int nid, unsigned long pfn, unsigned long end_pfn)
        if (end_pfn < pgdat_end_pfn(NODE_DATA(nid)))
                return false;

+       if (NODE_DATA(nid)->first_deferred_pfn != ULONG_MAX)
+               return true;
        /*
         * We start only with one section of pages, more pages are added as
         * needed until the rest of deferred pages are initialized.

Attachment: patch-dmesg.log
Description: patch-dmesg.log

Attachment: vanilla-dmesg.log
Description: vanilla-dmesg.log