Re: Question on __torture_rt_boost() else clause

From: Paul E. McKenney
Date: Tue Aug 22 2023 - 13:46:49 EST


On Tue, Aug 22, 2023 at 04:18:50PM +0000, Joel Fernandes wrote:
> Hi Paul,
>
> On Mon, Aug 21, 2023 at 08:12:50PM -0700, Paul E. McKenney wrote:
> > Hello, Joel!
> >
> > A quick question for you...
> >
> > I am doing catch-up additions of locktorture module parameters
> > to kernel-parameters.txt, and came across rt_boost_factor. The
> > multiplication by cxt.nrealwriters_stress in the !rt_task(current)
> > then-clause makes sense: No matter how many writers you have, the
> > number of boost operations per unit time remains roughly constant.
>
> > But I am having some difficulty rationalizing a similar multiplication
> > in the else-clause. That would seem to leave boosting in effect for
> > longer times the more writers there were.
>
> But the number of de-boost operations per-unit time should also remain a
> constant? I think you (or the original authors) wanted it to boost at every
> 50k ops at deboost at 500k ops originally.

The else-clause controls the boost duration. So if I am understanding
the code correctly, the more writers there are, the longer each writer
stays boosted. Which might be a good thing, but seemed strange.

> > Is that the intent?
>
> The original change before my patch to make boosting possible for non-rtmutex
> types already had that multiplication, see below for diff from my patch. My
> patch just kept the same thing to make the logic consistent (i.e. deboost
> less often).

Ah, you are right, I should have told "git blame" to dig deeper.

But hey, you did touch the code at one point! ;-)

> > Also, I am rationalizing the choice of 2 as default for rt_boost by
> > noting that "mutex" and "ww_mutex_lock" don't do boosting and that
> > preemption-disabling makes non-RT spinlocks immune from priority
> > inversion. Is this what you had in mind, or am I off in the weeds here?
>
> The 2 was just to make sure that we don't deboost as often as we boost, which
> is also what the old logic was trying to do.

This is a different "2". The rt_boost=0 says never boost, rt_boost=1
says boost only if the lock in question supports priority boosting, and
rt_boost=2 (the default) says boost unconditionally, aside from lock
types that don't define cur_ops->task_boost. Except that they all
define cur_ops->task_boost.

I am not seeing failures in my torture.sh testing, so maybe OK, but it
does seem a bit strange.

(And this probably predates your involvement as well, but so it goes!)

> What is the drawback of keeping the boost active for longer than not? It will
> trigger the PI-boosting (and in the future proxy exec) more often.

My concern is someone running this on a 1,000-CPU system. Though locking
being what it is, there is a non-negligible possibility that something
else breaks first.

> Also by making the factor configurable, I allow it to control how often we
> boost and deboost. IIRC, it was boosting much less often before I did that.

No argument with the frequency of boosting, just curiosity about the
duration increasing with increasing numbers of CPUs. I can rationalize
it, but then again, I can rationalize pretty much anything. ;-)

> > I am putting my best guess in the patch, and am including you on CC.
>
> Ok, thanks,

On the other hand, it looks like I can now reproduce a qspinlock hang
that happens maybe five to ten times a week across the entire fleet
in a few tens of minutes. On my laptop. ;-)

Now to start adding debug. Which will affect the reproduction times,
but life is like that sometimes...

Thanx, Paul

> - Joel
>
>
> -static void torture_rtmutex_boost(struct torture_random_state *trsp)
> -{
> - const unsigned int factor = 50000; /* yes, quite arbitrary */
> -
> - if (!rt_task(current)) {
> - /*
> - * Boost priority once every ~50k operations. When the
> - * task tries to take the lock, the rtmutex it will account
> - * for the new priority, and do any corresponding pi-dance.
> - */
> - if (trsp && !(torture_random(trsp) %
> - (cxt.nrealwriters_stress * factor))) {
> - sched_set_fifo(current);
> - } else /* common case, do nothing */
> - return;
> - } else {
> - /*
> - * The task will remain boosted for another ~500k operations,
> - * then restored back to its original prio, and so forth.
> - *
> - * When @trsp is nil, we want to force-reset the task for
> - * stopping the kthread.
> - */
> - if (!trsp || !(torture_random(trsp) %
> - (cxt.nrealwriters_stress * factor * 2))) {
> - sched_set_normal(current, 0);
> - } else /* common case, do nothing */
> - return;
> - }
> -}
> -