Re: general protection fault, probably for non-canonical address in pick_next_task_fair()

From: Chen Yu
Date: Fri Mar 01 2024 - 02:15:15 EST


On 2024-03-01 at 11:47:05 +0800, Abel Wu wrote:
> (+ Chen Yu, Oliver Sang)
>
> On 2/29/24 11:55 PM, Breno Leitao Wrote:
> > I've been running some stress test using stress-ng with a kernel with some
> > debug options enabled, such as KASAN and friends (See the config below).
> >
> > I saw it in rc4 and the decode instructions are a bit off (as it is here
> > also - search for mavabs in dmesg below and you will find something as `(bad)`,
> > so I though it was a machine issue. But now I see it again, and I am sharing
> > for awareness.
> >
> > This is happening in upstream kernel against the following commit
> > d206a76d7d2726 ("Linux 6.8-rc6")
> >
> > This is the exercpt that shows before the crash:
> >
> > general protection fault, probably for non-canonical address 0xdffffc0000000014: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN
> > KASAN: null-ptr-deref in range [0x00000000000000a0-0x00000000000000a7]
> >
> > This is the stack that is getting it
> >
> > ? __die_body (arch/x86/kernel/dumpstack.c:421)
> > ? die_addr (arch/x86/kernel/dumpstack.c:460)
> > ? exc_general_protection (arch/x86/kernel/traps.c:? arch/x86/kernel/traps.c:643)
> > ? asm_exc_general_protection (arch/x86/include/asm/idtentry.h:564)
> > ? pick_next_task_fair (kernel/sched/sched.h:1453 kernel/sched/fair.c:8435)

Seems to be the same reason pick_eevdf returns NULL.. it panic here..
cfs_rq = group_cfs_rq(se);

I remember lkp has regular stress-ng test for regression test, but
not detect this yet.

thanks,
Chenyu