[patch] preemptive kernel, preemptive-2.3.52-A7

From: Ingo Molnar (mingo@chiara.csoma.elte.hu)
Date: Mon Mar 13 2000 - 20:59:15 EST


On Mon, 13 Mar 2000, Linus Torvalds wrote:

> We want to avoid having long latencies, and we can easily get that by
> just allowing timer interrupts to schedule which we're in a big
> "memcpy_to_user()" and we don't hold any kernel lock etc. No need to
> try to be clever at lock release time - if we get a pending
> reschedule, we might as well leave it pending, it's going to be
> serviced soon enough anyway.

i've just implemented this (free preemption of pre-specified kernel areas)
and it's really simple, and appears to work nicely both in the UP and SMP
case.

There is a single current->may_preempt counter which is typically zero. If
the counter is non-zero, then the IRQ return path may preempt the process.
The counter is recursive (unused feature currently but same cost), and is
used in uaccess.h when the memcpy is non-constant (probably big enough.
It's cheaper to unconditionally do the incl+decl than checking for memcpy
size). Note that 'current' is already cached in 99% of these cases because
we already did an access_ok() check for the user-space copy, which
accessed current->addr_limit. The counter is persistent and present even
if the process is non-running - the 'may preempt' property of the task
does not depend on wether it's running. (This method differs from Andrea's
similar 'all-inclusive' method by having a clean per-process counter.)

i've also introduced a new light-weight 'IRQ-atomic' increment and
decrement method in atomic.h, because ->may_preempt has to be
local-IRQ-safe.

the IRQ return path in entry.S now checks wether:

        1) ->need_resched or ->pending is set

    and 2) wether we return to non-IRQ context

    and 3) ->allow_preempt is set.

the entry.S IRQ return path got one more conditional instruction, which
checks for need_resched and moves into the 'we might preempt even
kernel-space' path. This branch comes _after_ checking all the common
cases: return-to-userspace and no-event-pending. I believe this is pretty
much the cheapest way we can get away with this. (The patch does not allow
processing signal handlers though, only reschedules - that would be a
security hole in certain cases, correct?)

the uaccess.h changes are safe because we can reschedule even with the
kernel lock held, but it's illegal to call any of the user-copy routines
with spinlocks held. (There are some other places that we want to make
preemptible as well, but i first wanted to have this simple prototype.)

the patch gets some non-obvious cases right as well: if a cross-IPI marks
the process need_resched==1 then we might reschedule immediately if it
hits a big memcpy.

i've tested the patch (against pre-2.3.52-1) and it compiles both UP and
SMP, and i've tested it under load on SMP - appears to be stable. Have i
missed anything, and is this what we want?

> If you really want hard realtime, go to work with Victor and RTLinux..

(yep.)

        Ingo



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Wed Mar 15 2000 - 21:00:26 EST