Re: Splat in kernel RT while processing incoming network packets

From: Sebastian Andrzej Siewior
Date: Tue Jul 04 2023 - 06:05:36 EST


On 2023-07-03 18:15:58 [-0300], Wander Lairson Costa wrote:
> > Not sure how to proceed. One thing you could do is a hack similar like
> > net-Avoid-the-IPI-to-free-the.patch which does it for defer_csd.
>
> At first sight it seems straightforward to implement.
>
> > On the other hand we could drop net-Avoid-the-IPI-to-free-the.patch and
> > remove the warning because we have now commit
> > d15121be74856 ("Revert "softirq: Let ksoftirqd do its job"")
>
> But I am more in favor of a solution that removes code than one that
> adds more :)

Raising the softirq from anonymous (hardirq context) is not ideal for
the reasons I stated below.

> > Prior that, raising softirq from hardirq would wake ksoftirqd which in
> > turn would collect all pending softirqs. As a consequence all following
> > softirqs (networking, …) would run as SCHED_OTHER and compete with
> > SCHED_OTHER tasks for resources. Not good because the networking work is
> > no longer processed within the networking interrupt thread. Also not a
> > DDoS kind of situation where one could want to delay processing.
> >
> > With that change, this isn't the case anymore. Only an "unrelated" IRQ
> > thread could pick up the networking work which is less then ideal. That
> > is because the global softirq set is added, ksoftirq is marked for a
> > wakeup and could be delayed because other tasks are busy. Then the disk
> > interrupt (for instance) could pick it up as part of its threaded
> > interrupt.
> >
> > Now that I think about, we could make the backlog pseudo device a
> > thread. NAPI threading enables one thread but here we would need one
> > thread per-CPU. So it would remain kind of special. But we would avoid
> > clobbering the global state and delay everything to ksoftird. Processing
> > it in ksoftirqd might not be ideal from performance point of view.
>
> Before sending this to the ML, I talked to Paolo about using NAPI
> thread. He explained that it is implemented per interface. For example,
> for this specific case, it happened on the loopback interface, which
> doesn't implement NAPI. I am cc'ing him, so the can correct me if I am
> saying something wrong.

It is per NAPI-queue/instance and you could have multiple instances per
interface. However loopback has one and you need per-CPU threads if you
want to RPS your skbs to any CPU.

We could just remove the warning but then your RPS processes the skbs in
SCHED_OTHER. This might not be what you want. Maybe Paolo has a better
idea.

> > > Cheers,
> > > Wander

Sebastian