Re: [PATCH RFC v4 2/3] sched: Avoid placing RT threads on cores handling long softirqs

From: John Stultz
Date: Wed Oct 19 2022 - 18:09:37 EST


On Wed, Oct 19, 2022 at 2:11 AM Alexander Gordeev
<agordeev@xxxxxxxxxxxxx> wrote:
>
> On Mon, Oct 17, 2022 at 08:42:53PM -0700, John Stultz wrote:
> > Hrm. Suggestions? As select_task_rq_rt() is only one of the callers.
> > Trying to pass curr into cpu_busy_with_softirqs() would mean
> > cpupri_find_fitness() would need to read the cpu_rq(cpu)->curr for the
> > specified cpu and pass that in.
>
> May be you could have a lightweight checker that accepts rq and curr
> and gets called from select_task_rq_rt(). Then you could call that
> same checker from cpu_busy_with_softirqs().

Fair enough. Though your other questions are making me wonder if this
is necessary.

> > Just to expand what it should be in detail:
> > 1: (softirqs & LONG_SOFTIRQ_MASK) &&
> > 2: (curr == cpu_ksoftirqd ||
> > 3: task_thread_info(curr)->preempt_count & SOFTIRQ_MASK)
> >
> > Where we're checking
> > 1) that the active_softirqs and __cpu_softirq_pending() values on the
> > target cpu are running a long softirq.
> > AND (
> > 2) The current task on the target cpu is ksoftirqd
> > OR
> > 3) The preempt_count of the current task on the target cpu has SOFTIRQ entries
> > )
>
> 2) When the target CPU is handling or about to handle long softirqs
> already what is the difference if it is also running ksoftirqd or not?

Again, a good question! From my understanding, the original patch was
basically checking just #2 and #3 above, then additional logic was
added to narrow it to only the LONG_SOFTIRQ_MASK values, so that may
make the older part of the check redundant.

I fret there are some edge cases where on the target cpu softirqs
might be pending but ksoftirqd isn't running yet maybe due to a
lowish-prio rt task - such that the cpu could still be considered a
good target. But this seems a bit of a stretch.

> 3) What is the point of this check when 1) is true already?

Yeah, the more I think about this, the more duplicative it seems.
Again, there's some edge details about the preempt_count being set
before the active_softirq accounting is set, but the whole decision
here about the target cpus is a bit racy to begin with, so I'm not
sure if that is significant.

So I'll go ahead and simplify the check to just the LONG_SOFTIRQ_MASK
& (active | pending softirqs) check. This should avoid the need to
pull the cpu_rq(cpu)->curr value and simplify things.

Will send out a new version once I've been able to validate that
similification doesn't introduce a regression.

Thanks so much for the feedback and suggestions!
-john