Re: [PATCH] mm/damon/core-test: Initialise context before test in damon_test_set_attrs()

From: Feng Tang
Date: Tue Jul 18 2023 - 23:08:24 EST


Hi SeongJae,

Thanks for the review.

On Tue, Jul 18, 2023 at 04:16:56PM +0000, SeongJae Park wrote:
> Hi Feng Tang,
>
> On Tue, 18 Jul 2023 13:28:11 +0800 Feng Tang <feng.tang@xxxxxxxxx> wrote:
>
> > Running kunit test for 6.5-rc1 hits one bug:
> >
> > ok 10 damon_test_update_monitoring_result
> > general protection fault, probably for non-canonical address 0x1bffa5c419cfb81: 0000 [#1] PREEMPT SMP NOPTI
> > CPU: 1 PID: 110 Comm: kunit_try_catch Tainted: G N 6.5.0-rc2 #15
> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
> > RIP: 0010:damon_set_attrs+0xb9/0x120
> > Code: f8 00 00 00 4c 8d 58 e0 48 39 c3 74 ba 41 ba 59 17 b7 d1 49 8b 43 10 4d
> > 8d 4b 10 48 8d 70 e0 49 39 c1 74 50 49 8b 40 08 31 d2 <69> 4e 18 10 27 00 00
> > 49 f7 30 31 d2 48 89 c5 89 c8 f7 f5 31 d2 89
> > RSP: 0000:ffffc900005bfd40 EFLAGS: 00010246
> > RAX: ffffffff81159fc0 RBX: ffffc900005bfeb8 RCX: 0000000000000000
> > RDX: 0000000000000000 RSI: 01bffa5c419cfb69 RDI: ffffc900005bfd70
> > RBP: ffffc90000013c10 R08: ffffc900005bfdc0 R09: ffffffff81ff10ed
> > R10: 00000000d1b71759 R11: ffffffff81ff10dd R12: ffffc90000013a78
> > R13: ffff88810eb78180 R14: ffffffff818297c0 R15: ffffc90000013c28
> > FS: 0000000000000000(0000) GS:ffff88813bd00000(0000) knlGS:0000000000000000
> > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 0000000000000000 CR3: 0000000002a1c001 CR4: 0000000000370ee0
> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > Call Trace:
> > <TASK>
> > damon_test_set_attrs+0x63/0x1f0
> > kunit_generic_run_threadfn_adapter+0x17/0x30
> > kthread+0xfd/0x130
>
> Great. But it would be even greater if you could this kind of output after
> decoding the addreses using 'scripts/decode_stacktrace.sh` or
> 'scripts/faddr2line' from next time if possible.

I did run the decode script, but didn't paste it as I'm afraid the
whole text is too large. But yes, the decoded info is very helpful
for developers to analyze the issue. Seems I should add the info
for patches dealing with panic/hang issue in future.


Here is the decoded version (rerun on 6.5-rc2):

[ 1.123316] general protection fault, probably for non-canonical address 0x1bffa5c419cfb81: 0000 [#1] PREEMPT SMP NOPTI
[ 1.125356] CPU: 0 PID: 111 Comm: kunit_try_catch Tainted: G N 6.5.0-rc2 #15
[ 1.126299] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
[ 1.127173] RIP: 0010:damon_set_attrs (mm/damon/core.c:471 mm/damon/core.c:508 mm/damon/core.c:534 mm/damon/core.c:555)
[ 1.127676] Code: f8 00 00 00 4c 8d 58 e0 48 39 c3 74 ba 41 ba 59 17 b7 d1 49 8b 43 10 4d 8d 4b 10 48 8d 70 e0 49 39 c1 74 50 49 8b 40 08 31 d2 <69> 4e 18 10 27 00 00 49 f7 30 31 d2 48 89 c5 89 c8 f7 f5 31 d2 89
All code
========
0: f8 clc
1: 00 00 add %al,(%rax)
3: 00 4c 8d 58 add %cl,0x58(%rbp,%rcx,4)
7: e0 48 loopne 0x51
9: 39 c3 cmp %eax,%ebx
b: 74 ba je 0xffffffffffffffc7
d: 41 ba 59 17 b7 d1 mov $0xd1b71759,%r10d
13: 49 8b 43 10 mov 0x10(%r11),%rax
17: 4d 8d 4b 10 lea 0x10(%r11),%r9
1b: 48 8d 70 e0 lea -0x20(%rax),%rsi
1f: 49 39 c1 cmp %rax,%r9
22: 74 50 je 0x74
24: 49 8b 40 08 mov 0x8(%r8),%rax
28: 31 d2 xor %edx,%edx
2a:* 69 4e 18 10 27 00 00 imul $0x2710,0x18(%rsi),%ecx <-- trapping instruction
31: 49 f7 30 divq (%r8)
34: 31 d2 xor %edx,%edx
36: 48 89 c5 mov %rax,%rbp
39: 89 c8 mov %ecx,%eax
3b: f7 f5 div %ebp
3d: 31 d2 xor %edx,%edx
3f: 89 .byte 0x89

Code starting with the faulting instruction
===========================================
0: 69 4e 18 10 27 00 00 imul $0x2710,0x18(%rsi),%ecx
7: 49 f7 30 divq (%r8)
a: 31 d2 xor %edx,%edx
c: 48 89 c5 mov %rax,%rbp
f: 89 c8 mov %ecx,%eax
11: f7 f5 div %ebp
13: 31 d2 xor %edx,%edx
15: 89 .byte 0x89
[ 1.131693] RSP: 0000:ffffc9000059fd40 EFLAGS: 00010246
[ 1.133744] RAX: ffffffff81159fc0 RBX: ffffc9000059feb8 RCX: 0000000000000000
[ 1.136273] RDX: 0000000000000000 RSI: 01bffa5c419cfb69 RDI: ffffc9000059fd70
[ 1.138528] RBP: ffffc90000013c10 R08: ffffc9000059fdc0 R09: ffffffff81ff10ed
[ 1.140778] R10: 00000000d1b71759 R11: ffffffff81ff10dd R12: ffffc90000013a78
[ 1.142997] R13: ffff88810ea9c980 R14: ffffffff818297c0 R15: ffffc90000013c28
[ 1.145235] FS: 0000000000000000(0000) GS:ffff88813bc00000(0000) knlGS:0000000000000000
[ 1.147463] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1.148629] CR2: ffff88813ffff000 CR3: 0000000002a1c001 CR4: 0000000000370ef0
[ 1.150046] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1.151450] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1.152888] Call Trace:
[ 1.153480] <TASK>
[ 1.160117] damon_test_set_attrs+0x63/0x1f0
[ 1.167458] kunit_generic_run_threadfn_adapter (lib/kunit/try-catch.c:28)
[ 1.168050] kthread (kernel/kthread.c:379)
[ 1.168881] ret_from_fork (arch/x86/entry/entry_64.S:314)
[ 1.169756] ret_from_fork_asm+0x1b/0x30
[ 1.170211] RIP: 0000:0x0

> >
> > The problem seems to be related with the damon_ctx was used without
> > being initialized. Fix it by adding the initialization.
>
> Somehow the test always passed on my test machine, but maybe that's due to some
> different behavior of my compiler. I agree that could be the root cause
> because 'damon_set_attrs()' calls 'damon_update_monitoring_results()', which
> accesses the context's fields including the targets list. Since the list is
> not initialized in this test code, it would cause such error.

Yes, I further dumped more info, and the dmaon_ctx is not initialized
and filled with random data, especially the list_head 'adaptive_targets'.
In damon_update_monitoring_results(), looping that list triggered the
page fault.

>
> >
> > Fixes: aa13779be6b7 ("mm/damon/core-test: add a test for damon_set_attrs()")
> > Signed-off-by: Feng Tang <feng.tang@xxxxxxxxx>
>
> Reviewed-by: SeongJae Park <sj@xxxxxxxxxx>

Thank you!


- Feng

>
> Thanks,
> SJ
>
> > ---
> > mm/damon/core-test.h | 10 +++++-----
> > 1 file changed, 5 insertions(+), 5 deletions(-)
> >
> > diff --git a/mm/damon/core-test.h b/mm/damon/core-test.h
> > index c11210124344..bb07721909e1 100644
> > --- a/mm/damon/core-test.h
> > +++ b/mm/damon/core-test.h
> > @@ -320,25 +320,25 @@ static void damon_test_update_monitoring_result(struct kunit *test)
> >
> > static void damon_test_set_attrs(struct kunit *test)
> > {
> > - struct damon_ctx ctx;
> > + struct damon_ctx *c = damon_new_ctx();
> > struct damon_attrs valid_attrs = {
> > .min_nr_regions = 10, .max_nr_regions = 1000,
> > .sample_interval = 5000, .aggr_interval = 100000,};
> > struct damon_attrs invalid_attrs;
> >
> > - KUNIT_EXPECT_EQ(test, damon_set_attrs(&ctx, &valid_attrs), 0);
> > + KUNIT_EXPECT_EQ(test, damon_set_attrs(c, &valid_attrs), 0);
> >
> > invalid_attrs = valid_attrs;
> > invalid_attrs.min_nr_regions = 1;
> > - KUNIT_EXPECT_EQ(test, damon_set_attrs(&ctx, &invalid_attrs), -EINVAL);
> > + KUNIT_EXPECT_EQ(test, damon_set_attrs(c, &invalid_attrs), -EINVAL);
> >
> > invalid_attrs = valid_attrs;
> > invalid_attrs.max_nr_regions = 9;
> > - KUNIT_EXPECT_EQ(test, damon_set_attrs(&ctx, &invalid_attrs), -EINVAL);
> > + KUNIT_EXPECT_EQ(test, damon_set_attrs(c, &invalid_attrs), -EINVAL);
> >
> > invalid_attrs = valid_attrs;
> > invalid_attrs.aggr_interval = 4999;
> > - KUNIT_EXPECT_EQ(test, damon_set_attrs(&ctx, &invalid_attrs), -EINVAL);
> > + KUNIT_EXPECT_EQ(test, damon_set_attrs(c, &invalid_attrs), -EINVAL);
> > }
> >
> > static struct kunit_case damon_test_cases[] = {
> > --
> > 2.34.1
> >
> >
> >