Re: [linus:master] [sched/eevdf] 2227a957e1: BUG:kernel_NULL_pointer_dereference,address

From: Tiwei Bie
Date: Wed Jan 31 2024 - 08:15:35 EST


On 1/31/24 8:28 PM, Abel Wu wrote:
> On 1/31/24 8:10 PM, Tiwei Bie Wrote:
>> On 1/30/24 6:13 PM, Abel Wu wrote:
>>> On 1/30/24 3:24 PM, kernel test robot Wrote:
>>>>
>>>> [  512.079810][ T8305] BUG: kernel NULL pointer dereference, address: 0000002c
>>>> [  512.080897][ T8305] #PF: supervisor read access in kernel mode
>>>> [  512.081636][ T8305] #PF: error_code(0x0000) - not-present page
>>>> [  512.082337][ T8305] *pde = 00000000
>>>> [  512.082829][ T8305] Oops: 0000 [#1] PREEMPT SMP
>>>> [  512.083407][ T8305] CPU: 1 PID: 8305 Comm: watchdog Tainted: G        W        N 6.7.0-rc1-00006-g2227a957e1d5 #1 819e6d1a8b887f5f97adb4aed77d98b15504c836
>>>> [  512.084986][ T8305] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
>>>> [ 512.086203][ T8305] EIP: set_next_entity (fair.c:?)
>>>
>>> There was actually a NULL-test in pick_eevdf() before this commit,
>>> but I removed it by intent as I found it impossible to be NULL after
>>> examining 'all' the cases.
>>>
>>> Also cc Tiwei who once proposed to add this check back.
>>> https://lore.kernel.org/all/20231208112100.18141-1-tiwei.btw@xxxxxxxxxxxx/
>>
>> Thanks for cc'ing me. That's the case I worried about and why I thought
>> it might be worthwhile to add the sanity check back. I just sent out a
>> new version of the above patch with updated commit log and error message.
>
> I assuming the real problem is why it *can* be NULL at first place.
> IMHO the NULL check with a fallback selection doesn't solve this, but
> it indeed avoids kernel panic which is absolutely important.

I totally agree. The scheduling failure is unexpected and should be
addressed. And the sanity check is just to log the failures and avoid
unnecessary crashes in such situations.

Regards,
Tiwei