Re: [linus:master] [sched/eevdf] 2227a957e1: BUG:kernel_NULL_pointer_dereference,address

From: Chen Yu
Date: Mon Feb 19 2024 - 07:50:19 EST


On 2024-01-30 at 18:13:32 +0800, Abel Wu wrote:
> On 1/30/24 3:24 PM, kernel test robot Wrote:
> >
> >
> > Hello,
> >
> > (besides a previous performance report),
> > kernel test robot noticed "BUG:kernel_NULL_pointer_dereference,address" on:
> >
> > commit: 2227a957e1d5b1941be4e4207879ec74f4bb37f8 ("sched/eevdf: Sort the rbtree by virtual deadline")
> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> >
> > [test failed on linus/master 3a5879d495b226d0404098e3564462d5f1daa33b]
> > [test failed on linux-next/master 01af33cc9894b4489fb68fa35c40e9fe85df63dc]
> >
> > in testcase: trinity
> > version: trinity-i386-abe9de86-1_20230429
> > with following parameters:
> >
> > runtime: 300s
> > group: group-03
> > nr_groups: 5
> >
> > test-description: Trinity is a linux system call fuzz tester.
> > test-url: http://codemonkey.org.uk/projects/trinity/
> >
> >
> > compiler: clang-17
> > test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
> >
> > (please refer to attached dmesg/kmsg for entire log/backtrace)
> >
> >
> >
> > we found this issue happens in very random way (23 out of 999 runs).
> > but keeps clean on parent.
>
> Thanks for reporting, I will try to reproduce the issue. Does the 'parent'
> mean the same code branch without this commit?
>
> >
> > 84db47ca7146d7bd 2227a957e1d5b1941be4e420787
> > ---------------- ---------------------------
> > fail:runs %reproduction fail:runs
> > | | |
> > :999 2% 23:999 dmesg.BUG:kernel_NULL_pointer_dereference,address
> > :999 2% 23:999 dmesg.Kernel_panic-not_syncing:Fatal_exception
> > :999 2% 23:999 dmesg.Oops:#[##]
> >
> >
> >
> >
> > If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > the same patch/commit), kindly add following tags
> > | Reported-by: kernel test robot <oliver.sang@xxxxxxxxx>
> > | Closes: https://lore.kernel.org/oe-lkp/202401301012.2ed95df0-oliver.sang@xxxxxxxxx
> >
> >
> > sorry for below parse failure which caused no real line numbers.
> > we will follow further. the orgial dmesg could be fetch from below link.
> >
> >
> > [ 512.079810][ T8305] BUG: kernel NULL pointer dereference, address: 0000002c
> > [ 512.080897][ T8305] #PF: supervisor read access in kernel mode
> > [ 512.081636][ T8305] #PF: error_code(0x0000) - not-present page
> > [ 512.082337][ T8305] *pde = 00000000
> > [ 512.082829][ T8305] Oops: 0000 [#1] PREEMPT SMP
> > [ 512.083407][ T8305] CPU: 1 PID: 8305 Comm: watchdog Tainted: G W N 6.7.0-rc1-00006-g2227a957e1d5 #1 819e6d1a8b887f5f97adb4aed77d98b15504c836
> > [ 512.084986][ T8305] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> > [ 512.086203][ T8305] EIP: set_next_entity (fair.c:?)
>
> There was actually a NULL-test in pick_eevdf() before this commit,
> but I removed it by intent as I found it impossible to be NULL after
> examining 'all' the cases.
>
> Also cc Tiwei who once proposed to add this check back.
> https://lore.kernel.org/all/20231208112100.18141-1-tiwei.btw@xxxxxxxxxxxx/
>
> Thanks,
> Abel
>

While looking at pick_eevdf(), I have a thought.
Currently the sched entity is sorted by their deadline. During task
pickup, the pick_eevdf() scans for an candidate sched entity with the
smallest deadline. Meanwhile this candidate sched entity must also be
eligible.

The scan is O(lgn) on average, and O(1) at best case. How about making the
average scan even faster by sorting the sched entity not only by deadline,
but also the eligibility? The idea is that, the eligible sched entity with
smaller deadline is sorted at the front the tree. Otherwise, if the entity
is not eligible, even if it has a smaller deadline, it should be sorted
at the end of the tree.

After the change, pick_eevdf() get the leftmost sched entity at O(1) on
average. Besides, it is guaranteed to return non-NULL sched entity in
pick_eevdf(), which prevents suspicious NULL pointer exception in pick_eevdf().

For example, suppose there are two sched entities to be queued, se_a and se_b.
Consider their eligibility and deadline, there are 6 combination:

1. se_a is eligible, se_b is eligible, se_a.deadline < se_b.deadline
2. se_a is eligible, se_b is eligible, se_a.deadline >= se_b.deadline
3. se_a is eligible, se_b is not eligible
4. se_a is not eligible, se_b is eligible
5. se_a is not eligible, se_b is not eligible, se_a.deadline < se_b.deadline
6. se_a is not eligible, se_b is not eligible, se_a.deadline >= se_b.deadline

In scenario 1, 3, 5, sched_entity se_a should be sorted before se_b,
so pick_eevdf() would pick se_a first.

When enqueuing a new sched entity, it is regarded as eligible if its
vlag is positive. In theory later in pick_eevdf(), the eligibility
of this sched entity should be re-checked via entity_eligible(). But
consider if the sched entity is eliglble when enqueued, it is very
likely the same sched entity remains eligible when pick_eevdf(), because
the V keeps moving forward but the vruntime of this sched entity remain
unchanged - the vlag could get larger.

Something like this untested:

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 533547e3c90a..831043cc1432 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -551,11 +551,19 @@ static inline u64 min_vruntime(u64 min_vruntime, u64 vruntime)
static inline bool entity_before(const struct sched_entity *a,
const struct sched_entity *b)
{
- /*
- * Tiebreak on vruntime seems unnecessary since it can
- * hardly happen.
- */
- return (s64)(a->deadline - b->deadline) < 0;
+ bool eli_a, eli_b;
+
+ eli_a = (a->vlag >= 0) ? true : false;
+ eli_b = (b->vlag >= 0) ? true : false;
+
+ if ((eli_a && eli_b) || (!eli_a && !eli_b))
+ /*
+ * Tiebreak on vruntime seems unnecessary since it can
+ * hardly happen.
+ */
+ return (s64)(a->deadline - b->deadline) < 0;
+
+ return eli_a ? 1 : 0;
}

static inline s64 entity_key(struct cfs_rq *cfs_rq, struct sched_entity *se)
@@ -877,10 +885,8 @@ struct sched_entity *__pick_first_entity(struct cfs_rq *cfs_rq)
*/
static struct sched_entity *pick_eevdf(struct cfs_rq *cfs_rq)
{
- struct rb_node *node = cfs_rq->tasks_timeline.rb_root.rb_node;
struct sched_entity *se = __pick_first_entity(cfs_rq);
struct sched_entity *curr = cfs_rq->curr;
- struct sched_entity *best = NULL;

/*
* We can safely skip eligibility check if there is only one entity
@@ -899,45 +905,8 @@ static struct sched_entity *pick_eevdf(struct cfs_rq *cfs_rq)
if (sched_feat(RUN_TO_PARITY) && curr && curr->vlag == curr->deadline)
return curr;

- /* Pick the leftmost entity if it's eligible */
- if (se && entity_eligible(cfs_rq, se)) {
- best = se;
- goto found;
- }
-
- /* Heap search for the EEVD entity */
- while (node) {
- struct rb_node *left = node->rb_left;
-
- /*
- * Eligible entities in left subtree are always better
- * choices, since they have earlier deadlines.
- */
- if (left && vruntime_eligible(cfs_rq,
- __node_2_se(left)->min_vruntime)) {
- node = left;
- continue;
- }
-
- se = __node_2_se(node);
-
- /*
- * The left subtree either is empty or has no eligible
- * entity, so check the current node since it is the one
- * with earliest deadline that might be eligible.
- */
- if (entity_eligible(cfs_rq, se)) {
- best = se;
- break;
- }
-
- node = node->rb_right;
- }
-found:
- if (!best || (curr && entity_before(curr, best)))
- best = curr;
-
- return best;
+ /* Pick the leftmost entity */
+ return se;
}

#ifdef CONFIG_SCHED_DEBUG
--
2.25.1