Re: [PATCH] sched/rt: Fix possible warn when push_rt_task

From: Peter Zijlstra
Date: Mon Jul 03 2023 - 08:39:25 EST


On Sat, Jun 24, 2023 at 05:21:30PM +0800, Hui Tang wrote:
> A warn may be triggered during reboot, as follows:
>
> reboot
> ->kernel_restart
> ->machine_restart
> ->smp_send_stop --- ipi handler set_cpu_online(cpu, false)
>
> balance_callback
> -> __balance_callback
> ->push_rt_task
> -> find_lock_lowest_rq --- offline cpu in vec->mask not be cleared
> -> find_lowest_rq
> -> cpupri_find
> -> cpupri_find_fitness
> -> __cpupri_find [cpumask_and(..., vec->mask)]
> -> set_task_cpu(next_task, lowest_rq->cpu) --- WARN_ON(!oneline(cpu)
>
> So add !cpu_online(lowest_rq->cpu) check before set_task_cpu().
> The fix does not completely fix the problem, since cpu_online_mask may
> be cleared after check.

This is tinkering.. at best. I'm sure there's a score of other issues,
not in the least the very same issue in deadline.c. But since this
doesn't actually fix anything, this clearly isn't the right way.

> Fixes: 4ff9083b8a9a8 ("sched/core: WARN() when migrating to an offline CPU")
> Signed-off-by: Hui Tang <tanghui20@xxxxxxxxxx>
> ---
> kernel/sched/rt.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> index 00e0e5074115..852ef18b6a50 100644
> --- a/kernel/sched/rt.c
> +++ b/kernel/sched/rt.c
> @@ -2159,6 +2159,9 @@ static int push_rt_task(struct rq *rq, bool pull)
> goto retry;
> }
>
> + if (unlikely(!cpu_online(lowest_rq->cpu)))
> + goto out;
> +
> deactivate_task(rq, next_task, 0);
> set_task_cpu(next_task, lowest_rq->cpu);
> activate_task(lowest_rq, next_task, 0);
> --
> 2.17.1
>