Re: new IRQ scalability changes in 2.3.48

From: Ingo Molnar (mingo@chiara.csoma.elte.hu)
Date: Tue Mar 14 2000 - 15:54:14 EST


On Tue, 14 Mar 2000 yodaiken@fsmlabs.com wrote:

> > people will have the nicely preemptable UP kernel. The latency-quality of
> > the SMP kernel will be much worse than the UP kernel's latency.
>
> I don't see why. Can you explain?

yep: the SMP kernel will not be preemptible like the UP kernel.

> Why ? The length of time between scheduler calls under SMP should be
> smaller than the length of time between scheduler calls on UP - no?

no, they would be bigger because with Linus' suggestion the UP kernel uses
spinlocks to make the kernel fully preemptible - we will be able to
reschedule the kernel in a preemptive way at any point that does not hold
any spinlock (or lock). The SMP kernel would still reschedules in a
'cooperative' way, because spinlocks would not have this 'collect total
current spinlock count' property (and thus preempting kernel space would
not be allowed).

IMPORTANT: to avoid any misunderstanding, this does not make the SMP
kernel worse in any way than it is right now. But this would be a
(dramatic wrt. latencies!) 'UP-only' improvement, which _feels_ wrong, to
me at least.

meanwhile i've measured the cost of doing per-process counters on SMP, and
it's not that bad as i thought. The following code sequence:

void dummy (void)
{
        atomic_inc_local(&current->may_preempt);
        atomic_inc_local(&current->may_preempt);
        atomic_dec_local(&current->may_preempt);
        atomic_dec_local(&current->may_preempt);
}

simulates a typical scenario, two spinlocks taken and released in a
function, with good 'synergy' effects. This translates into:

 dd4: b8 00 e0 ff ff movl $0xffffe000,%eax
 dd9: 21 e0 andl %esp,%eax
 ddb: ff 40 18 incl 0x18(%eax)
 dde: ff 40 18 incl 0x18(%eax)
 de1: ff 48 18 decl 0x18(%eax)
 de4: ff 48 18 decl 0x18(%eax)

so there is an offset cost of 2 instructions (7 bytes) to load current
(which in i guess 30-40% of the cases is already loaded), and 1
instruction (3 bytes) to increase/decrease. If register pressure is higher
and 'current' is not used near to where the spinlocks are used then the
cost is higher. But this has less icache footprint than doing a dumb
global memory counter for example, which has a 6 bytes cost per inc/dec:

 de8: ff 05 00 00 00 00 incl 0x0
 dee: ff 05 00 00 00 00 incl 0x0
 df4: ff 0d 00 00 00 00 decl 0x0
 dfa: ff 0d 00 00 00 00 decl 0x0

of course it has more footprint than 0 instructions [ == the current SMP
kernel].

        Ingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Wed Mar 15 2000 - 21:00:28 EST