Re: [PATCH] sched/rt: Use cpu_active_mask to prevent rto_push_irq_work's dead loop

From: Xuewen Yan
Date: Fri Nov 18 2022 - 07:09:10 EST


On Fri, Nov 18, 2022 at 6:16 AM Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
>
> On Mon, 14 Nov 2022 20:04:53 +0800
> Xuewen Yan <xuewen.yan@xxxxxxxxxx> wrote:
>
> > +++ b/kernel/sched/rt.c
> > @@ -2219,6 +2219,7 @@ static int rto_next_cpu(struct root_domain *rd)
> > {
> > int next;
> > int cpu;
> > + struct cpumask tmp_cpumask;
>
> If you have a machine with thousands of CPUs, this will likely kill the
> stack.
Ha, I did not take it into account. Thanks!

>
> >
> > /*
> > * When starting the IPI RT pushing, the rto_cpu is set to -1,
> > @@ -2238,6 +2239,11 @@ static int rto_next_cpu(struct root_domain *rd)
> > /* When rto_cpu is -1 this acts like cpumask_first() */
> > cpu = cpumask_next(rd->rto_cpu, rd->rto_mask);
> >
> > + cpumask_and(&tmp_cpumask, rd->rto_mask, cpu_active_mask);
> > + if (rd->rto_cpu == -1 && cpumask_weight(&tmp_cpumask) == 1 &&
> > + cpumask_test_cpu(smp_processor_id(), &tmp_cpumask))
> > + break;
> > +
>
> Kill the above.
>
> > rd->rto_cpu = cpu;
> >
> > if (cpu < nr_cpu_ids) {
>
> Why not just add here:
>
> if (!cpumask_test_cpu(cpu, cpu_active_mask))
> continue;
> return cpu;
> }
>
> ?
Let's consider this scenario:
the online_cpu_mask is 0x03(cpu0/1),the active_cpu_mask is
0x01(cpu0),the rto cpu is cpu0,
the rto_mask is 0x01, and the irq cpu is cpu0, as a result, the first
loop, the rto_cpu would be -1,
but the loop < rto_loop_next, on next loop, because of the rto_cpu is
-1, so the next rto cpu would
be cpu0 still, as a result, the cpu0 would push rt tasks to
cpu1(inactive cpu) while running in the irq_work.

So we should judge whether the current cpu(the only one active cpu) is
the next loop's cpu.

Thanks!

>
> -- Steve