Re: [RFC PATCH 2/2] softirq: Drop the warning from do_softirq_post_smp_call_flush().

From: Jesper Dangaard Brouer
Date: Tue Aug 15 2023 - 08:09:33 EST




On 14/08/2023 11.35, Sebastian Andrzej Siewior wrote:
This is an undesired situation and it has been attempted to avoid the
situation in which ksoftirqd becomes scheduled. This changed since
commit d15121be74856 ("Revert "softirq: Let ksoftirqd do its job"")
and now a threaded interrupt handler will handle soft interrupts at its
end even if ksoftirqd is pending. That means that they will be processed
in the context in which they were raised.

$ git describe --contains d15121be74856
v6.5-rc1~232^2~4

That revert basically removes the "overload" protection that was added
to cope with DDoS situations in Aug 2016 (Cc. Cloudflare). As described
in https://git.kernel.org/torvalds/c/4cd13c21b207 ("softirq: Let
ksoftirqd do its job") in UDP overload situations when UDP socket
receiver runs on same CPU as ksoftirqd it "falls-off-an-edge" and almost
doesn't process packets (because softirq steals CPU/sched time from UDP
pid). Warning Cloudflare (Cc) as this might affect their production
use-cases, and I recommend getting involved to evaluate the effect of
these changes.

I do realize/acknowledge that the reverted patch caused other latency
issues, given it was a "big-hammer" approach affecting other softirq
processing (as can be seen by e.g. the watchdog fixes patches).
Thus, the revert makes sense, but how to regain the "overload"
protection such that RX networking cannot starve processes reading from
the socket? (is this what Sebastian's patchset does?)

--Jesper

Thread link for people Cc'ed: https://lore.kernel.org/all/20230814093528.117342-1-bigeasy@xxxxxxxxxxxxx/#r