Re: [PATCH 00/37] softirq: Per vector masking v3

From: Frederic Weisbecker
Date: Thu Feb 28 2019 - 22:45:46 EST


On Thu, Feb 28, 2019 at 09:33:15AM -0800, Linus Torvalds wrote:
> On Thu, Feb 28, 2019 at 9:12 AM Frederic Weisbecker <frederic@xxxxxxxxxx> wrote:
> >
> > So this set should hopefully address all reviews from the v2, and
> > fix all reports from the extremely useful (as always) Kbuild testing
> > bot. It also completes support for all archs.
>
> The one thing I'd still like to see is some actual performance
> (latency?) numbers.
>
> Maybe they were hiding somewhere in the pile and my quick scan missed
> them. But the main argument for this was that we've had the occasional
> latency issues with softirqs blocking (eg the USB v4l frame dropping
> etc), and I did that SOFTIRQ_NOW_MASK because it helped one particular
> case.
>
> And you don't seem to have removed that hack, and I'd really like to
> see that that thing isn't needed any more.
>
> Because otherwise the whole series seems a bit pointless, don't you
> think? If it doesn't fix that fundamental issue, then what's the point
> of all this churn..

Numbers are indeed missing. In fact this patchset mostly just brings an
infrastructure. We have yet to pinpoint the most latency-inducing
softirq disabled sites and make them disable only the vectors that
are involved in a given lock.

And last but not least, this patchset allows us to soft-interrupt
code that disabled other vectors but it doesn't yet allow us to
soft-interrupt a vector itself. Not much is needed to allow that
from the softirq core code. But we can't do that blindly. For example
TIMER_SOFTIRQ, HRTIMER_SOFTIRQ, TASKLET_SOFTIRQ, NET_RX_SOFTIRQ
can't interrupt each others because some locks can be taken on all
of them (the socket lock for example). Although so many vectors
involved for a single lock is probably rare but still...

The only solution I see to make vectors interruptible is to proceed
the same way as we do for softirq disabled sections: proceed case
by case on a per handler basis. Hopefully we can operate per subsystem
and we don't need to start from drivers.

So the idea is the following: if the lock A can be taken from both TIMER_SOFTIRQ
and BLOCK_SOFTIRQ, we do this from the timer handler for example:

__do_softirq() {
// all vectors disabled
run_timers {
random_timer_callback() {
bh = local_bh_enable_mask(~(TIMER_SOFTIRQ | BLOCK_SOFTIRQ));
spin_lock(&A);
do_some_work();
spin_unlock(&A);
local_bh_disable_mask(bh);
}
}
}

Sounds tedious but that's the only way I can imagine to make that correct.

Another way could be for locks to piggyback the vectors they are involved in
on initialization:

DEFINE_SPINLOCK_SOFTIRQ(A, TIMER_SOFTIRQ | BLOCK_SOFTIRQ);

Then callsites can just use:

bh = spin_lock_softirq(A);
....
spin_unlock_softirq(A, bh);

Then the lock function always arrange to only disable TIMER_SOFTIRQ | BLOCK_SOFTIRQ
if not nesting, whether we are in a vector or not. The only drawback is for the
relevant spin_lock_t to carry those init flags.

>
> See commit 3c53776e29f8 ("Mark HI and TASKLET softirq synchronous"),
> which also has a couple of people listed who could hopefully re-test
> the v4l latency thing with whatever USB capture dongle it was that
> showed the issue.

So in this case for example, I'll need to check the callbacks involved
and make them disable only the vectors that need to be disabled.

I should try to reproduce the issue myself.

Thanks.