Re: Re: [linus:master] [sched/eevdf] 2227a957e1: BUG:kernel_NULL_pointer_dereference,address

From: Oliver Sang
Date: Tue Feb 20 2024 - 21:17:26 EST


hi, Chen Yu,

On Mon, Feb 19, 2024 at 08:35:21PM +0800, Chen, Yu C wrote:
> On 2024-01-31 at 20:28:19 +0800, Abel Wu wrote:
> > On 1/31/24 8:10 PM, Tiwei Bie Wrote:
> > > On 1/30/24 6:13 PM, Abel Wu wrote:
> > > > On 1/30/24 3:24 PM, kernel test robot Wrote:
> > > > >
> > > > > [  512.079810][ T8305] BUG: kernel NULL pointer dereference, address: 0000002c
> > > > > [  512.080897][ T8305] #PF: supervisor read access in kernel mode
> > > > > [  512.081636][ T8305] #PF: error_code(0x0000) - not-present page
> > > > > [  512.082337][ T8305] *pde = 00000000
> > > > > [  512.082829][ T8305] Oops: 0000 [#1] PREEMPT SMP
> > > > > [  512.083407][ T8305] CPU: 1 PID: 8305 Comm: watchdog Tainted: G        W        N 6.7.0-rc1-00006-g2227a957e1d5 #1 819e6d1a8b887f5f97adb4aed77d98b15504c836
> > > > > [  512.084986][ T8305] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> > > > > [ 512.086203][ T8305] EIP: set_next_entity (fair.c:?)
> > > >
> > > > There was actually a NULL-test in pick_eevdf() before this commit,
> > > > but I removed it by intent as I found it impossible to be NULL after
> > > > examining 'all' the cases.
> > > >
> > > > Also cc Tiwei who once proposed to add this check back.
> > > > https://lore.kernel.org/all/20231208112100.18141-1-tiwei.btw@xxxxxxxxxxxx/
> > >
> > > Thanks for cc'ing me. That's the case I worried about and why I thought
> > > it might be worthwhile to add the sanity check back. I just sent out a
> > > new version of the above patch with updated commit log and error message.
> >
> > I assuming the real problem is why it *can* be NULL at first place.
> > IMHO the NULL check with a fallback selection doesn't solve this, but
> > it indeed avoids kernel panic which is absolutely important.
>
> I tried to reproduce this issue locally but with no luck. It might
> be related to lkp's environment so I'm thinking of adding the following
> change into lkp's test repo to help narrow down: when the pick_eevdf() finds
> an NULL candidate, print the whole rb-tree, including each entity's vruntime,
> min_vruntime, deadline, etc. So we can figure out why we failed to find a proper
> entity, thoughts?
>
> Example output:
>
> [ 325.425017] cfs_rq avg_vruntime:-798130176 avg_load:8192 min_vruntime:29296269335
> [ 325.425247] current se 0xff11000105858080 is on_rq, deadline:29299081075 min_vruntime:29296537185 vruntime:29297356572, load:1024
> [ 325.425514] Traverse rb-tree from left to right
> [ 325.425635] se 0xff11000101a28080 deadline:29293835648 min_vruntime:29290835648 vruntime:29290835648 eligible
> [ 325.425876] se 0xff11000113113000 deadline:29298566205 min_vruntime:29290835648 vruntime:29297726206 non-eligible
> [ 325.426121] se 0xff110001036cb000 deadline:29298613981 min_vruntime:29290835648 vruntime:29297094477 non-eligible
> [ 325.426358] se 0xff11000118148080 deadline:29298616409 min_vruntime:29297255015 vruntime:29297255015 non-eligible
> [ 325.426597] se 0xff11000111523000 deadline:29299066492 min_vruntime:29290835648 vruntime:29296949653 non-eligible
> [ 325.426836] se 0xff11000106173000 deadline:29299152426 min_vruntime:29296537185 vruntime:29296537185 non-eligible
> [ 325.427085] se 0xff11000105f03000 deadline:29299269335 min_vruntime:29296269335 vruntime:29296269335 eligible
> [ 325.427322] se 0xff110001043c0080 deadline:29299707737 min_vruntime:29296707737 vruntime:29296707737 non-eligible
> [ 325.427558] Traverse rb-tree from topdown
> [ 325.427648] left se 0xff110001036cb000 deadline:29298613981 min_vruntime:29290835648 vruntime:29297094477 left half eligible
> [ 325.427917] left se 0xff11000113113000 deadline:29298566205 min_vruntime:29290835648 vruntime:29297726206 left half eligible
> [ 325.428171] left se 0xff11000101a28080 deadline:29293835648 min_vruntime:29290835648 vruntime:29290835648 left half eligible
> [ 325.428440] middle se 0xff11000101a28080 deadline:29293835648 min_vruntime:29290835648 vruntime:29290835648 eligible
> [ 325.428679] Found best:0xff11000101a28080

I applied below patch upon 2227a957e1, captured such like:

[ 514.461242][ T8390] cfs_rq avg_vruntime:386638640128 avg_load:2048 min_vruntime:763383370431
[ 514.461431][ T8390]

..

[ 514.535935][ T8390] current on_rq se 0xc5851400, deadline:-10892060147320170 min_vruntime:-9622957955883918 vruntime:-10892060147608212, load:629
[ 514.536772][ T8390] Traverse rb-tree from left to right
[ 514.537138][ T8390] se 0xec1234e0 deadline:763384870431 min_vruntime:763383370431 vruntime:763383370431 non-eligible
[ 514.537835][ T8390] se 0xec4fcf20 deadline:763762447228 min_vruntime:763760947228 vruntime:763760947228 non-eligible
[ 514.538539][ T8390] Traverse rb-tree from topdown
[ 514.538877][ T8390] middle se 0xec1234e0 deadline:763384870431 min_vruntime:763383370431 vruntime:763383370431 non-eligible
[ 514.539605][ T8390] middle se 0xec4fcf20 deadline:763762447228 min_vruntime:763760947228 vruntime:763760947228 non-eligible
[ 514.540340][ T8390] Found best:0x0
[ 514.540613][ T8390] BUG: kernel NULL pointer dereference, address: 00000074


more details are in attached dmesg-debug.xz


>
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 533547e3c90a..ddacb3dc91b9 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -856,6 +856,69 @@ struct sched_entity *__pick_first_entity(struct cfs_rq *cfs_rq)
> return __node_2_se(left);
> }
>
> +static void debug_print_eevdf(struct cfs_rq *cfs_rq)
> +{
> + struct sched_entity *se, *curr = cfs_rq->curr;
> + struct sched_entity *best = NULL;
> + struct rb_node *node = cfs_rq->tasks_timeline.rb_root.rb_node;
> + struct rb_node *n;
> +
> + printk("cfs_rq avg_vruntime:%lld avg_load:%lld min_vruntime:%lld\n",
> + cfs_rq->avg_vruntime, cfs_rq->avg_load, cfs_rq->min_vruntime);
> +
> + if (curr && curr->on_rq)
> + printk("current on_rq se 0x%lx, deadline:%lld min_vruntime:%lld vruntime:%lld, load:%ld\n",
> + (unsigned long)curr, curr->deadline, curr->min_vruntime,
> + curr->vruntime, scale_load_down(curr->load.weight));
> +
> + printk("Traverse rb-tree from left to right\n");
> + for (n = rb_first(&cfs_rq->tasks_timeline.rb_root); n != NULL; n = rb_next(n)) {
> + bool eli;
> +
> + se = __node_2_se(n);
> + eli = entity_eligible(cfs_rq, se);
> + printk(" se 0x%lx deadline:%lld min_vruntime:%lld vruntime:%lld %s\n",
> + (unsigned long)se, se->deadline, se->min_vruntime, se->vruntime,
> + eli ? "eligible" : "non-eligible");
> + }
> + printk("Traverse rb-tree from topdown\n");
> + while (node) {
> + struct rb_node *left = node->rb_left;
> + struct sched_entity *se_left = __node_2_se(left);
> + bool eli = false;
> +
> + if (left) {
> + eli = vruntime_eligible(cfs_rq,
> + __node_2_se(left)->min_vruntime);
> +
> + printk(" left se 0x%lx deadline:%lld min_vruntime:%lld vruntime:%lld %s\n",
> + (unsigned long)se_left, se_left->deadline, se_left->min_vruntime,
> + se_left->vruntime,
> + eli ? "left half eligible" : "left half non-eligible");
> + }
> +
> + if (left && eli) {
> + node = left;
> + continue;
> + }
> +
> + se = __node_2_se(node);
> +
> + eli = entity_eligible(cfs_rq, se);
> + printk(" middle se 0x%lx deadline:%lld min_vruntime:%lld vruntime:%lld %s\n",
> + (unsigned long)se, se->deadline, se->min_vruntime, se->vruntime,
> + eli ? "eligible" : "non-eligible");
> +
> + if (eli) {
> + best = se;
> + break;
> + }
> +
> + node = node->rb_right;
> + }
> + printk("Found best:0x%lx\n", (unsigned long)best);
> +}
> +
> /*
> * Earliest Eligible Virtual Deadline First
> *
> @@ -937,6 +1000,9 @@ static struct sched_entity *pick_eevdf(struct cfs_rq *cfs_rq)
> if (!best || (curr && entity_before(curr, best)))
> best = curr;
>
> + if (unlikely(!best))
> + debug_print_eevdf(cfs_rq);
> +
> return best;
> }
>
> --
> 2.34.1
>
> thanks,
> Chenyu

Attachment: dmesg-debug.xz
Description: application/xz