Re: [PATCH] sched/rt: Use cpu_active_mask to prevent rto_push_irq_work's dead loop

From: Steven Rostedt
Date: Thu Nov 17 2022 - 17:02:26 EST


On Mon, 14 Nov 2022 20:04:53 +0800
Xuewen Yan <xuewen.yan@xxxxxxxxxx> wrote:

> +++ b/kernel/sched/rt.c
> @@ -2219,6 +2219,7 @@ static int rto_next_cpu(struct root_domain *rd)
> {
> int next;
> int cpu;
> + struct cpumask tmp_cpumask;

If you have a machine with thousands of CPUs, this will likely kill the
stack.

>
> /*
> * When starting the IPI RT pushing, the rto_cpu is set to -1,
> @@ -2238,6 +2239,11 @@ static int rto_next_cpu(struct root_domain *rd)
> /* When rto_cpu is -1 this acts like cpumask_first() */
> cpu = cpumask_next(rd->rto_cpu, rd->rto_mask);
>
> + cpumask_and(&tmp_cpumask, rd->rto_mask, cpu_active_mask);
> + if (rd->rto_cpu == -1 && cpumask_weight(&tmp_cpumask) == 1 &&
> + cpumask_test_cpu(smp_processor_id(), &tmp_cpumask))
> + break;
> +

Kill the above.

> rd->rto_cpu = cpu;
>
> if (cpu < nr_cpu_ids) {

Why not just add here:

if (!cpumask_test_cpu(cpu, cpu_active_mask))
continue;
return cpu;
}

?

-- Steve