Re: [linus:master] [kmemleak] 39042079a0: kernel-selftests.kvm.memslot_perf_test.fail

From: Oliver Sang
Date: Fri Jan 26 2024 - 03:50:28 EST


hi, Catalin,

On Thu, Jan 25, 2024 at 12:01:56PM +0000, Catalin Marinas wrote:
> On Thu, Jan 25, 2024 at 03:34:37PM +0800, kernel test robot wrote:
> > kernel test robot noticed "kernel-selftests.kvm.memslot_perf_test.fail" on:
> >
> > commit: 39042079a0c241d09fa6fc3bb67c2ddf60011d0f ("kmemleak: avoid RCU stalls when freeing metadata for per-CPU pointers")
> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> [...]
> > The kernel config and materials to reproduce are available at:
> > https://download.01.org/0day-ci/archive/20240125/202401251429.d3dea02b-oliver.sang@xxxxxxxxx
> [...]
> > # Testing RW performance with 1 runs, 5 seconds each
> > #
> > not ok 71 selftests: kvm: memslot_perf_test # TIMEOUT 120 seconds
>
> I'm not sure how this relates to kmemleak, especially the commit above.
> It might as well be that the above kmemleak commit increases the system
> load and you trip over the timeout above. Is it still reproducible with
> a larger timeout (unfortunately I don't have the hardware to try to
> reproduce the problem)?
>
> I can see a lockdep warning in the dmesg but that doesn't look related
> to kmemleak.

below things FYI.

(1) we set same timeout for 39042079a0 and its parent, test always can pass
on parent (by 20 runs), but almost always fail on 39042079a0 (18 out of 20
runs fail).

(2) while running tests, we use a kmemleak monitor.
for parent runs, seems no problem, but for 39042079a0 runs, it will capture
below log:

$ cat kmemleak
unreferenced object 0xff110001198a1720 (size 16):
comm "swapper/0", pid 1, jiffies 4294783051
hex dump (first 16 bytes):
41 43 50 49 20 48 4d 41 54 00 6b 6b 6b 6b 6b a5 ACPI HMAT.kkkkk.
backtrace (crc 45396b86):
[<ffffffff81a97e28>] __kmem_cache_alloc_node+0x228/0x310
[<ffffffff81936de5>] __kmalloc_node_track_caller+0x55/0xf0
[<ffffffff819160b4>] kstrdup+0x34/0x60
[<ffffffff81aada3b>] mt_set_default_dram_perf+0x3cb/0x460
[<ffffffff8658a9a6>] hmat_init+0x2b6/0x3a0
[<ffffffff81002c58>] do_one_initcall+0xd8/0x410
[<ffffffff864aaa4a>] do_initcalls+0x1ca/0x390
[<ffffffff864ab53e>] kernel_init_freeable+0x8ce/0xc80
[<ffffffff83beed8f>] kernel_init+0x1f/0x1d0
[<ffffffff810a07c1>] ret_from_fork+0x31/0x70
[<ffffffff81006341>] ret_from_fork_asm+0x11/0x20

(3) we tried to disable the kmemleak monitor for 39042079a0, then
memslot_perf_test can pass now. there is no kmemleak dump.

(4) we also noticed a test can pass on 39042079a0 now, but always fail on
parent, no matter kmemleak monitor enable or disable.
(normally we won't report a commit if it makes some test start to pass, so this
information is not in original report)

f67f8d4a8c1e1ebc 39042079a0c241d09fa6fc3bb67
---------------- ---------------------------
fail:runs %reproduction fail:runs
| | |
:20 150% 30:30 kernel-selftests.kvm.rseq_test.pass

>
> --
> Catalin