Re: [PATCH v2] sched/rt: Plug rt_mutex_setprio() vs push_rt_task() race

From: Dietmar Eggemann
Date: Tue Feb 01 2022 - 10:28:36 EST


On 27/01/2022 16:40, Valentin Schneider wrote:
> John reported that push_rt_task() can end up invoking
> find_lowest_rq(rq->curr) when curr is not an RT task (in this case a CFS
> one), which causes mayhem down convert_prio().
>
> This can happen when current gets demoted to e.g. CFS when releasing an
> rt_mutex, and the local CPU gets hit with an rto_push_work irqwork before
> getting the chance to reschedule. Exactly who triggers this work isn't
> entirely clear to me - switched_from_rt() only invokes rt_queue_pull_task()
> if there are no RT tasks on the local RQ, which means the local CPU can't
> be in the rto_mask.
>
> My current suspected sequence is something along the lines of the below,
> with the demoted task being current.
>
> mark_wakeup_next_waiter()
> rt_mutex_adjust_prio()
> rt_mutex_setprio() // deboost originally-CFS task
> check_class_changed()
> switched_from_rt() // Only rt_queue_pull_task() if !rq->rt.rt_nr_running
> switched_to_fair() // Sets need_resched
> __balance_callbacks() // if pull_rt_task(), tell_cpu_to_push() can't select local CPU per the above
> raw_spin_rq_unlock(rq)
>
> // need_resched is set, so task_woken_rt() can't
> // invoke push_rt_tasks(). Best I can come up with is
> // local CPU has rt_nr_migratory >= 2 after the demotion, so stays
> // in the rto_mask, and then:
>
> <some other CPU running rto_push_irq_work_func() queues rto_push_work on this CPU>
> push_rt_task()
> // breakage follows here as rq->curr is CFS
>
> Move an existing check to check rq->curr vs the next pushable task's
> priority before getting anywhere near find_lowest_rq(). While at it, add an
> explicit sched_class of rq->curr check prior to invoking
> find_lowest_rq(rq->curr). Align the DL logic to also reschedule regardless
> of next_task's migratability.
>
> Link: http://lore.kernel.org/r/Yb3vXx3DcqVOi+EA@donbot
> Fixes: a7c81556ec4d ("sched: Fix migrate_disable() vs rt/dl balancing")
> Reported-by: John Keeping <john@xxxxxxxxxxxx>
> Signed-off-by: Valentin Schneider <valentin.schneider@xxxxxxx>
> Tested-by: John Keeping <john@xxxxxxxxxxxx>
> ---
> v1 -> v2: Reworded comments, added DL part (Dietmar)
> ---

LGTM.

Rescheduling in case rq->curr has lower prio (including CFS tasks) than
next_task and bailing out in case rq->curr is DL or stop-task prevents
the bug from happening.

The only small issue is the fact that, unlike in push_rt_task(), the DL
logic only compares DL tasks (if (dl_task(rq->curr) ...), so you miss
rescheduling when rq->curr is a lower priority non-DL task.

Reviewed-by: Dietmar Eggemann <dietmar.eggemann@xxxxxxx>

[...]