Re: [kmemleak] b751c52bb5: BUG:kernel_hang_in_boot_stage

From: Rong Chen
Date: Mon Jun 15 2020 - 22:50:35 EST




On 6/10/20 6:56 PM, Catalin Marinas wrote:
On Wed, Jun 10, 2020 at 03:51:56PM +0800, kernel test robot wrote:
FYI, we noticed the following commit (built with gcc-7):

commit: b751c52bb587ae66f773b15204ef7a147467f4c7 ("kmemleak: increase DEBUG_KMEMLEAK_EARLY_LOG_SIZE default to 16K")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

in testcase: boot

on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 8G
[...]
BUG: kernel hang in boot stage

To reproduce:

# build kernel
cd linux
cp config-5.3.0-11789-gb751c52bb587a .config
make HOSTCC=gcc-7 CC=gcc-7 ARCH=i386 olddefconfig prepare modules_prepare bzImage
I've never tried kmemleak on i386.

Anyway, I'm not sure what caused the hang (or whether it's a hang at
all) but I suspect prior to the above commit, kmemleak probably just
disabled itself (early log buffer exceeded). So the bug may have been
there already, only that kmemleak started working and tripped over it
when the log buffer increased.

Hi,

Sorry for the late, I can reproduce the problem with command "bin/lkp qemu -k <bzImage> job-script",
and the kernel hangs:

[ÂÂÂ 0.333897] -----------------------------------------------------
[ÂÂÂ 0.334561]ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ |block | try |context|
[ÂÂÂ 0.335170] -----------------------------------------------------
[ 0.335760] context: ok | ok | ok |
[ 0.337995] try: ok | ok | ok |
[ 0.340089] block: ok | ok | ok |
[ 0.342175] spinlock: ok | ok | ok |
[ÂÂÂ 0.344481] -------------------------------------------------------
[ÂÂÂ 0.345068] Good, all 261 testcases passed! |
[ÂÂÂ 0.345514] ---------------------------------
KVM internal error. Suberror: 3
extra data[0]: 80000b0e
extra data[1]: 31
extra data[2]: 182
extra data[3]: bfff0
EAX=00000000 EBX=00200297 ECX=00000000 EDX=ffffffff
ESI=d2e997c0 EDI=d2e997f0 EBP=d2bbb038 ESP=c00bfff4
EIP=f4dccd57 EFL=00210046 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =007b 00000000 ffffffff 00c0f300 DPL=3 DSÂÂ [-WA]
CS =0060 00000000 ffffffff 00c09b00 DPL=0 CS32 [-RA]
SS =0068 00000000 ffffffff 00c09300 DPL=0 DSÂÂ [-WA]
DS =007b 00000000 ffffffff 00c0f300 DPL=3 DSÂÂ [-WA]
FS =00d8 23331000 ffffffff 00809300 DPL=0 DS16 [-WA]
GS =00e0 f6422900 00000018 00409100 DPL=0 DSÂÂ [--A]
LDT=0000 00000000 ffffffff 00c00000
TR =0080 ff403000 0000206b 00008b00 DPL=0 TSS32-busy
GDT=ÂÂÂÂ ff401000 000000ff
IDT=ÂÂÂÂ ff400000 000007ff
CR0=80050033 CR2=00000000 CR3=130fc000 CR4=00000690
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
DR6=00000000fffe0ff0 DR7=0000000000000400
EFER=0000000000000000
Code=00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00


Is there a chance that the kernel got much slower with kmemleak enabled
and the test scripts timed out?
no, the parent commit log is:

[ÂÂÂ 0.313845] -----------------------------------------------------
[ÂÂÂ 0.314608]ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ |block | try |context|
[ÂÂÂ 0.315314] -----------------------------------------------------
[ 0.315974] context: ok | ok | ok |
[ 0.318261] try: ok | ok | ok |
[ 0.320478] block: ok | ok | ok |
[ 0.322562] spinlock: ok | ok | ok |
[ÂÂÂ 0.324825] -------------------------------------------------------
[ÂÂÂ 0.325403] Good, all 261 testcases passed! |
[ÂÂÂ 0.325809] ---------------------------------
[ÂÂÂ 0.326221] kmemleak: Early log buffer exceeded (401), please increase DEBUG_KMEMLEAK_EARLY_LOG_SIZE
[ÂÂÂ 0.327065] ACPI: Core revision 20190816
[ÂÂÂ 0.327585] clocksource: hpet: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604467 ns
[ÂÂÂ 0.328545] APIC: Switch to symmetric I/O mode setup
[ 0.329009] Enabling APIC mode: Flat. Using 1 I/O APICs
[ÂÂÂ 0.329572] masked ExtINT on CPU#0
[ÂÂÂ 0.330686] ENABLING IO-APIC IRQs
[ÂÂÂ 0.331001] init IO_APIC IRQs
[ÂÂÂ 0.331274]Â apic 0 pin 0 not connected


Does this problem still exist with the latest mainline?
yes, still in v5.7.

Best Regards,
Rong Chen