Re: [PATCH 14/31] sched_ext: Implement BPF extensible scheduler class

From: Barret Rhoden
Date: Fri Dec 02 2022 - 12:08:44 EST


hi -

On 11/30/22 03:22, Tejun Heo wrote:
[...]
+static bool consume_dispatch_q(struct rq *rq, struct rq_flags *rf,
+ struct scx_dispatch_q *dsq)
+{
+ struct scx_rq *scx_rq = &rq->scx;
+ struct task_struct *p;
+ struct rq *task_rq;
+ bool moved = false;
+retry:
+ if (list_empty(&dsq->fifo))
+ return false;
+
+ raw_spin_lock(&dsq->lock);
+ list_for_each_entry(p, &dsq->fifo, scx.dsq_node) {
+ task_rq = task_rq(p);
+ if (rq == task_rq)
+ goto this_rq;
+ if (likely(rq->online) && !is_migration_disabled(p) &&
+ cpumask_test_cpu(cpu_of(rq), p->cpus_ptr))
+ goto remote_rq;
+ }
+ raw_spin_unlock(&dsq->lock);
+ return false;
+
+this_rq:
+ /* @dsq is locked and @p is on this rq */
+ WARN_ON_ONCE(p->scx.holding_cpu >= 0);
+ list_move_tail(&p->scx.dsq_node, &scx_rq->local_dsq.fifo);
+ dsq->nr--;
+ scx_rq->local_dsq.nr++;
+ p->scx.dsq = &scx_rq->local_dsq;
+ raw_spin_unlock(&dsq->lock);
+ return true;
+
+remote_rq:
+#ifdef CONFIG_SMP
+ /*
+ * @dsq is locked and @p is on a remote rq. @p is currently protected by
+ * @dsq->lock. We want to pull @p to @rq but may deadlock if we grab
+ * @task_rq while holding @dsq and @rq locks. As dequeue can't drop the
+ * rq lock or fail, do a little dancing from our side. See
+ * move_task_to_local_dsq().
+ */
+ WARN_ON_ONCE(p->scx.holding_cpu >= 0);
+ list_del_init(&p->scx.dsq_node);
+ dsq->nr--;
+ p->scx.holding_cpu = raw_smp_processor_id();
+ raw_spin_unlock(&dsq->lock);
+
+ rq_unpin_lock(rq, rf);
+ double_lock_balance(rq, task_rq);
+ rq_repin_lock(rq, rf);
+
+ moved = move_task_to_local_dsq(rq, p);

you might be able to avoid the double_lock_balance() by using move_queued_task(), which internally hands off the old rq lock and returns with the new rq lock.

the pattern for consume_dispatch_q() would be something like:

- kfunc from bpf, with this_rq lock held
- notice p isn't on this_rq, goto remote_rq:
- do sched_ext accounting, like the this_rq->dsq->nr--
- unlock this_rq
- p_rq = task_rq_lock(p)
- double_check p->rq didn't change to this_rq during that unlock
- new_rq = move_queued_task(p_rq, rf, p, new_cpu)
- do sched_ext accounting like new_rq->dsq->nr++
- unlock new_rq
- relock the original this_rq
- return to bpf

you still end up grabbing both locks, but just not at the same time.

plus, task_rq_lock() takes the guesswork out of whether you're getting p's rq lock or not. it looks like you're using the holding_cpu to handle the race where p moves cpus after you read task_rq(p) but before you lock that task_rq. maybe you can drop the whole concept of the holding_cpu?

thanks,
barret


+
+ double_unlock_balance(rq, task_rq);
+#endif /* CONFIG_SMP */
+ if (likely(moved))
+ return true;
+ goto retry;
+}