Re: [RFC PATCH net-next 1/2] net: Use SMP threads for backlog NAPI.

From: Ferenc Fejes
Date: Thu Sep 21 2023 - 15:30:09 EST


Hi!

On Wed, 2023-09-20 at 17:57 +0200, Sebastian Andrzej Siewior wrote:
> On 2023-08-23 15:35:41 [+0200], Paolo Abeni wrote:
> > On Mon, 2023-08-14 at 11:35 +0200, Sebastian Andrzej Siewior wrote:
> > > @@ -4781,7 +4733,7 @@ static int enqueue_to_backlog(struct
> > > sk_buff *skb, int cpu,
> > >   * We can use non atomic operation since we own
> > > the queue lock
> > >   */
> > >   if (!__test_and_set_bit(NAPI_STATE_SCHED, &sd-
> > > >backlog.state))
> > > - napi_schedule_rps(sd);
> > > + __napi_schedule_irqoff(&sd->backlog);
> > >   goto enqueue;
> > >   }
> > >   reason = SKB_DROP_REASON_CPU_BACKLOG;
> >
> > I *think* that the above could be quite dangerous when cpu ==
> > smp_processor_id() - that is, with plain veth usage.
> >
> > Currently, each packet runs into the rx path just after
> > enqueue_to_backlog()/tx completes.
> >
> > With this patch there will be a burst effect, where the backlog
> > thread
> > will run after a few (several) packets will be enqueued, when the
> > process scheduler will decide - note that the current CPU is
> > already
> > hosting a running process, the tx thread.
> >
> > The above can cause packet drops (due to limited buffering) or very
> > high latency (due to long burst), even in non overload situation,
> > quite
> > hard to debug.
> >
> > I think the above needs to be an opt-in, but I guess that even RT
> > deployments doing some packet forwarding will not be happy with
> > this
> > on.
>
> I've been looking at this again and have been thinking what you said
> here. I think part of the problem is that we lack a policy/ mechanism
> when a DoS is happening and what to do.
>
> Before commit d15121be74856 ("Revert "softirq: Let ksoftirqd do its
> job"") when a lot of network packets are processed then processing is
> moved to ksoftirqd and continues based on how the scheduler schedules
> the SCHED_OTHER ksoftirqd task. This avoids lock-ups of the system
> and
> it can do something else in between. Any interrupt will not continue
> the
> outstanding softirq backlog but wait for ksoftirqd. So it basically
> avoids the networking overload. It throttles the throughput if
> needed.
>
> This isn't the case after that commit. Now, the CPU can be stuck with
> processing networking packets if the packets come in fast enough.
> Even
> if ksoftirqd is woken up, the next interrupt (say the timer) will
> continue with at least one round.
> By using NAPI-threads it is possible to give the control back to the
> scheduler which can throttle the NAPI processing in favour of other
> threads that ask for CPU. As you pointed out, waking the thread does
> not
> guarantee that it will immediately do the NAPI work. It can be
> delayed
> based on current load on the system.
>
> This could be influenced by assigning the NAPI-thread a SCHED_FIFO
> priority. Based on the priority it could be ensured that the thread
> starts right away or "later" if something else is more important.
> However, this opens the DoS window again: The scheduler will put the
> NAPI thread on CPU as long as it asks for it with no throttling.
>
> If we could somehow define a DoS condition once we are overwhelmed
> with
> packets, then we could act on it and throttle it. This in turn would
> allow a SCHED_FIFO priority without the fear of a lockup if the
> system
> is flooded with packets.

Can this be avoided if we reuse gro_flush_timeout as the maximum time
the NAPI thread can be scheduled?

>
> > Cheers,
> >
> > Paolo
>
> Sebastian

Ferenc