to sum things up, we have three main problem areas that are connected to
hardirq and softirq processing:
- a little utility written by Simon Kirby proved that no matter how much
softirq throttling, it's easy to lock up a pretty powerful Linux
box via a high rate of network interrupts, from relatively low-powered
clients as well. 2.4.6, 2.4.7, 2.4.10 all lock up. Alexey said it as
well that it's still easy to lock up low-powered Linux routers via more
or less normal traffic.
- prior 2.4.7 we used to 'leak' softirq handling => we ended up missing
softirqs in a number of circumstances. Stock 2.4.10 still has a number
of places that do this too.
- a number of people have reported gigabit performance problems (some
people reported a 10-20% drop in performance under load) since
ksoftirqd was added - which was added to fix some of the 2.4.6-
softirq-handling latency problems.
we also have another problem that often pops up when the BIOS goes bad or
a device driver does some mistake:
- Linux often 'locks up' if it gets into a 'interrupt storm' - when
interrupt sources that send a very high rate of interrupts. This can be
seen as boot-time hangs and module-insert-time hangs as well.
the attached patch, while a bit radical, is i believe a robust solution to
all four problems. It gives gigabit performance back, avoids the lockups
and attempts to reach as short softirq-processing latency as possible.
the new mechanizm:
- the irq handling code has been extended to support 'soft mitigation',
ie. to mitigate the rate of hardware interrupts, without support from
the actual hardware. There is a reasonable default, but the value can
also be decreased/increased on a per-irq basis via /proc/irq/NR/max_rate.
the method is the following. We count the number of interrupts serviced,
and if within a jiffy there are more than max_rate interrupts, the code
disables the IRQ source and marks it as IRQ_MITIGATED. On the next timer
interrupt the irq_rate_check() function is called, which makes sure that
'blocked' irqs are restarted & handled properly. The interrupt is disabled
in the interrupt controller, which has the nice side-effect of fixing and
blocking interrupt storms. (The support code for 'soft mitigation' is
designed to be very lightweight, it's a decrement and a test in the IRQ
handling hot path.)
(note that in case of shared interrupts, another 'innocent' device might
stay disabled for some short amount of time as well - but this is not an
issue because this mitigation does not make that device inoperable, it
just delays its interrupt by up to 10 msecs. Plus, modern systems have
properly distributed interrupts.)
- softirq code got simplified significantly. The concept is to 'handle all
pending softirqs' - just as the hardware IRQ code 'handles all hardware
interrupts that were passed to it'. Since most of the time there is a
direct relationship between softirq work and hardirq work, the
mitigation of hardirqs mitigates softirq load as well.
- ksoftirqd is gone, there is never any softirq pending while
softirq-unaware code is executing.
- the tasklet code needed some cleanup along the way, and it also won some
restart-on-enable and restart-on-unlock properties that it lacked
before. (but which is desired.)
due to these changes, the linecount in softirq.c got smaller by 25%.
[i dropped the unwakeup change - but that one could be useful in the VM,
to eg. unwakeup bdflush or kswapd.]
- drivers can optionally use the set_irq_rate(irq, new_rate) call to
change the current IRQ rate. Drivers are the ones who know best what
kind of loads to expect from the hardware, so they might want to
influence this value. Also, drivers that implement IRQ mitigation
themselves in hardware, can effectively disable the soft-mitigation code
by using a very high rate value.
what is the concept behind all this? Simplicity, and concept. We were
clearly heading in the wrong direction: putting more complexity into the
core softirq code to handle some really extreme and unusual cases. Also,
softirqs were slowly morphing into something process-ish - but in Linux we
already have a concept of processes, so we'd have two dualling concepts.
(We still have tasklets, which are not really processes - they are
single-threaded paths of execution.)
with this patch, softirqs can again be what they should be: lightweight
'interrupt code' that processes hard-IRQ events but still does this with
interrupts enabled, to allow for low hard-IRQ latencies. Anything that is
conceptually heavyweight IMO does not belong into softirqs, it should be
moved into process contexts. That will take care of CPU-time usage
accounting and CPU-time-limiting and priority issues as well.
(the patch also imports the latency and softirq-restart fixes from my
previous softirq patches.)
i've tested the patch on both UP, SMP, XT-PIC and APIC systems, it
correctly limits network interrupt rates (and other device interrupt
rates) to the given limit. I've done stress-testing as well. The patch is
against 2.4.11-pre1, but it applies just fine to the -ac tree as well.
with a high irq-rate limit set, ping flooding has this effect on the
test-system:
[root@mars /root]# vmstat 1
procs memory swap io
r b w swpd free buff cache si so bi bo in
0 0 0 0 877024 1140 11364 0 0 12 0 30960
0 0 0 0 877024 1140 11364 0 0 0 0 30950
0 0 0 0 877024 1140 11364 0 0 0 0 30520
ie. 30k interrupts/sec. With the max_rate set to 1000 interrupts/sec:
[root@mars /root]# echo 1000 > /proc/irq/21/max_rate
[root@mars /root]# vmstat 1
procs memory swap io
r b w swpd free buff cache si so bi bo in
0 0 0 0 877004 1144 11372 0 0 0 0 1112
0 0 0 0 877004 1144 11372 0 0 0 0 1111
0 0 0 0 877004 1144 11372 0 0 0 0 1111
so it works just fine here. Interactive tasks are still snappy over the
same interface.
Comments, reports, suggestions and testing feedback is more than welcome,
Ingo
This archive was generated by hypermail 2b29 : Sun Oct 07 2001 - 21:00:17 EST