Re: [PATCH v3] rcu: Only boost rcu reader tasks with lower priority than boost kthreads

From: Paul E. McKenney
Date: Fri Mar 18 2022 - 10:57:50 EST


On Fri, Mar 18, 2022 at 05:50:35AM +0000, Zhang, Qiang1 wrote:
> On Sat, Mar 12, 2022 at 03:11:04AM +0000, Zhang, Qiang1 wrote:
> > On 2022-03-11 10:22:26 [+0800], Zqiang wrote:
> > > When RCU_BOOST is enabled, the boost kthreads will boosting readers
> > > who are blocking a given grace period, if the current reader tasks
> > ^ Period.
> >
> > > have a higher priority than boost kthreads(the boost kthreads priority
> > > not always 1, if the kthread_prio is set),
> >
> > >>This confuses me:
> > >>- Why does this matter
> >
> > In preempt-rt system, if the kthread_prio is not set, it prio is 1.
> > the boost kthreads can preempt almost rt task, It will affect
> > the real-time performance of some user rt tasks. In preempt-rt systems,
> > in most scenarios, this kthread_prio will be configured.
> >
> >Just following up... These questions might have been answered, but
> >I am not seeing those answers right off-hand.
> >
> >Is the grace-period latency effect of choosing not to boost high-priority
> >tasks visible at the system level in any actual workload?
> >
> >Suppose that a SCHED_DEADLINE task has exhausted its time quantum,
> >and has thus been preempted within an RCU read-side critical section.
> >Can priority boosting from a SCHED_FIFO prio-1 task cause it to start
> >running?
> >
> >Do delays in RCU priority boosting cause excessive grace-period
> >latencies on real workloads, even when all the to-be-boosted
> >tasks are SCHED_OTHER?
> >
> >Thoughts?
>
> I have tested this modification these days, I originally planned to generate a Kconfig option to control
> whether to skip tasks with higher priority than boost kthreads. but it doesn't seem necessary
> because I find it's optimization is not particularly
> obvious in the actual scene, I find that tasks with higher priority than boost kthreads
> will quickly exit the rcu critical area , even if be preempted in the rcu critical area.
> sorry for the noise.

Thank you for getting back with this information, and no need to
apologize. We all get excited about a potential change from time to time.
Part of us maintainers' jobs is to ask hard questions when that appears
to be happening. ;-)

If you have continued interest in this area, it would be good to keep
looking. After all, neither RCU expedited grace periods nor RCU priority
boosting were designed with these new use cases in mind, so it is quite
likely that there is a useful change to be made in there somewhere.

You see, RCU expedited grace periods were designed for throughput rather
than latency. The original use case was an old networking API that
needed to wait for a grace period on each and every one of a series of
some tens of thousands of system calls. If one or two of those system
calls took a few hundred milliseconds, but the rest completed in less than
a millisecond, no harm done. (Yes, there are now newer APIs that allow
many changes to be made with only the one grace-period wait. But the
kernel must continue to support the old API: Never Break Userspace.)

For its part, RCU priority boosting was originally designed for
debuggging. The point was to avoid OOMing the system when someone
misconfigured their application's real-time priorities. As you know,
such misconfiguration can easily prevent low-priority RCU readers from
ever completing.

So it is reasonably likely that some change or another is needed. After
all, new use cases require new functionality and new fixes. The trick
is figuring out which change makes sense amongst the huge group of other
possible changes that each add much more complexity than improvement.
But part of the process of finding that change that makes sense is trying
out quite a few changes that don't help all that much. ;-)

Thanx, Paul

> Thanks,
> Zqiang
>
> >
> > Thanx, Paul
> >
> > Thanks
> > Zqiang
> >
> > >>- If it is not RT prio, what is then? Higher or lower? Afaik it is
> > >> always >= 1.
> >
> > >>>If it is not RT prio, the sanitize_kthread_prio() will limit RT prio
> >
> > > boosting is useless, skip
> > > current task and select next task to boosting, reduce the time for a
> > > given grace period.
> >
> > >>So if the task, that is stuck in a rcu_read() section, has a higher
> > >>priority than the boosting thread then boosting is futile. Understood.
> > >>
> > >>Please correct me if I'm wrong but this is intended for !SCHED_OTHER
> > >>tasks since there shouldn't a be PI chain on boost_mtx so that its
> > >>default RT priority is boosted above what has been configured.
> >
> > >>>Yes, you are right. If the boosting task which itself already boosted due to PI chain,
> > >>>and Its priority may only be temporarily higher than boost kthreads, once that
> > >>>PI boost is lifted the task may still be in a RCU section, but if we have been skipped it,
> > >>>this task have been missed the opportunity to be boosted.
> >
> > >>
> > >>You skip boosting tasks which are itself already boosted due to a PI
> > >>chain. Once that PI boost is lifted the task may still be in a RCU
> > >>section. But if I understand you right, your intention is skip boosting
> > >>tasks with a higher priority and concentrate and those which are in
> > >>need. This shouldn't make a difference unless the scheduler is able to
> > >>move the rcu-boosted task to another CPU.
> > >>
> >
> > >>>Yes, It make sense when the rcu-boosted kthreads and task which to be boosting
> > >>>should run difference CPU .
> >
> > >>Am I right so far? If so this should be part of the commit message (the
> > >>intention and the result). Also, please add that part with
> > >>boost_exp_tasks. The comment above boost_mtx is now above
> > >>boost_exp_tasks with a space so it looks (at least to me) like these two
> > >>don't belong together.
> >
> > >>>Yes, I will add your description to the commit information.
> >
> >
> > > Suggested-by: Uladzislau Rezki (Sony) <urezki@xxxxxxxxx>
> > > Signed-off-by: Zqiang <qiang1.zhang@xxxxxxxxx>
> >
> > >Sebastian