Re: new IRQ scalability changes in 2.3.48

From: Ingo Molnar (mingo@chiara.csoma.elte.hu)
Date: Mon Mar 13 2000 - 09:25:16 EST

Next message: Chris Mason: "Re: (reiserfs) Re: patch: reiserfs for 2.3.49"
Previous message: Geert Uytterhoeven: "Re: [linux-fbdev] fbdev 2.4.0 cleanups"
In reply to: Andrea Arcangeli: "Re: new IRQ scalability changes in 2.3.48"
Next in thread: Andrea Arcangeli: "Re: new IRQ scalability changes in 2.3.48"
Reply: Andrea Arcangeli: "Re: new IRQ scalability changes in 2.3.48"
Reply: Jeff Garzik: "Re: new IRQ scalability changes in 2.3.48"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Thu, 9 Mar 2000, Andrea Arcangeli wrote:

> >no. First, this is definitely not going to happen before 2.5. And even in
> >that case we do not care about copy_user latencies. Adding 'may preempt'
>
> So why the lowlatency patch is doing this stuff if you think
> current->need_reshed is going to be zero during the copy_user business?

i think you might be misunderstanding the point i (and others) tried to
make. Of course uaccess.h generates latencies in a non-preemptible kernel.
But a preemptible kernel is not about moving uaccess.h and similar simple
latency sources into a freely preemptible space, a preemptible kernel is a
solution for a problem category much bigger than that.

> Unless you want to add also a slowww conditional schedule within the
> spin_unlock() (that will trigger at the last lock released if need_resched
> is 1), then the preemtible kernel won't preemt anything if the timeslice
> expired while any spinlock was held. And you also for sure payed the cost
> to avoid preemption during the spin_lock/unlock.

of course there is some overhead here, but it's not at all as slow as you
imagine. It's these two additional (nonlocked) instructions on x86:

        decl %0
        jz 1f
.section offline.preempt
        call do_reschedule

where %0 is current->spinlock_depth - a local cached variable in most
cases, and an untaken forward conditional jump - about 1 cycle overhead
and about 10 bytes icache footprint. A single (incl) instruction in
spin_lock()&friends. Plus possibly one instruction calculating
current->spinlock_depth.

Additionally, as a side-effect we get a 'free' debugging check for
schedule():

if (current->spinlock_depth)
BUG();

this temporary debugging aid catches illegal spinlock uses. (which does
happen quite often)

(this above mechanizm also implements it also for the UP case. It would of
course be a config option initially.)

> IMHO most of the code needs at least a basic kind of serialization lock
> held so the preemtable kernel is not going to have much changes to
> preempt.

? see above.

> For example in the free_inodes the preemtable kernel won't buy anything.
> The same for shrink_mmap and most other places where you correctly added
> the conditional schedule in the lowlatency patch. You'll have to keep the
> conditional schedule even with the preemtable kernel there. And you'll
> also pay the double cost the lock fast path.

no, this is a different problem category: the amount of time spinlocks are
held. Obviously there are latency problems there. But with a preemptive
kernel to get bound latencies we have reduced the problem to fix spinlock
latencies alone - which is a much easier task.

To repeat: we reduce the task to a much smaller amount of code. Additional
this amount of code is not only important to get good latencies, but also
important to get good SMP performance - so with SMP performance
optimization we will also get good latencies, basically for free. This is
the point and it's also elegant and mainainable design to couple a feature
(that ordinary users do not care about) transparently to another feature
(that people care about).

> I currently don't believe preemptible kernel is the way to go.
> Allowing some CPU bound part that runs with no locks to be preemtable
> looks a nice latency optimization with no downside instead.

no, it's an unnecessery hack comparable to conditional_schedule(). I'd
like to do it right.

Ingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

Next message: Chris Mason: "Re: (reiserfs) Re: patch: reiserfs for 2.3.49"
Previous message: Geert Uytterhoeven: "Re: [linux-fbdev] fbdev 2.4.0 cleanups"
In reply to: Andrea Arcangeli: "Re: new IRQ scalability changes in 2.3.48"
Next in thread: Andrea Arcangeli: "Re: new IRQ scalability changes in 2.3.48"
Reply: Andrea Arcangeli: "Re: new IRQ scalability changes in 2.3.48"
Reply: Jeff Garzik: "Re: new IRQ scalability changes in 2.3.48"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Wed Mar 15 2000 - 21:00:24 EST