Re: [RFC PATCH 2/2] softirq: Drop the warning from do_softirq_post_smp_call_flush().

From: Yan Zhai
Date: Tue Aug 15 2023 - 18:33:01 EST


On Tue, Aug 15, 2023 at 7:08 AM Jesper Dangaard Brouer <hawk@xxxxxxxxxx> wrote:
>
>
>
> On 14/08/2023 11.35, Sebastian Andrzej Siewior wrote:
> > This is an undesired situation and it has been attempted to avoid the
> > situation in which ksoftirqd becomes scheduled. This changed since
> > commit d15121be74856 ("Revert "softirq: Let ksoftirqd do its job"")
> > and now a threaded interrupt handler will handle soft interrupts at its
> > end even if ksoftirqd is pending. That means that they will be processed
> > in the context in which they were raised.
>
> $ git describe --contains d15121be74856
> v6.5-rc1~232^2~4
>
> That revert basically removes the "overload" protection that was added
> to cope with DDoS situations in Aug 2016 (Cc. Cloudflare). As described
> in https://git.kernel.org/torvalds/c/4cd13c21b207 ("softirq: Let
> ksoftirqd do its job") in UDP overload situations when UDP socket
> receiver runs on same CPU as ksoftirqd it "falls-off-an-edge" and almost
> doesn't process packets (because softirq steals CPU/sched time from UDP
> pid). Warning Cloudflare (Cc) as this might affect their production
> use-cases, and I recommend getting involved to evaluate the effect of
> these changes.
>
> I do realize/acknowledge that the reverted patch caused other latency
> issues, given it was a "big-hammer" approach affecting other softirq
> processing (as can be seen by e.g. the watchdog fixes patches).
> Thus, the revert makes sense, but how to regain the "overload"
> protection such that RX networking cannot starve processes reading from
> the socket? (is this what Sebastian's patchset does?)
>
Thanks for notifying us. We will need to evaluate if this is going to
change the picture under serious floods.

Yan

> --Jesper
>
> Thread link for people Cc'ed:
> https://lore.kernel.org/all/20230814093528.117342-1-bigeasy@xxxxxxxxxxxxx/#r