Re: [patch, 2.6.10-rc2] sched: fix ->nr_uninterruptible handling bugs

From: Ingo Molnar
Date: Thu Nov 18 2004 - 10:23:36 EST



* Linus Torvalds <torvalds@xxxxxxxx> wrote:

> And it's not just P4's either. It shows up on at least P-M's too, even
> though that's a PPro-based core like a PIII. It's at least 22 cycles
> according to my tests (certainly not 10), but it seems to have a
> bigger impact than that, likely because it also ends up serializing
> the pipeline around it (ie it makes the instructions _around_ it
> slower too, something you don't end up seeing if you just do rdtsc
> which _also_ serializes).
>
> Try lmbench some time. Just pick a UP machine, compile a UP kernel on
> it, run lmbench, then compile a SMP kernel, and run it again. There's
> literally a 10-20% performance hit on a lot of things. Just from a
> _couple_ of locks on the paths.
>
> Sure, some of it is actual different paths, like the VM teardown,
> which on SMP is just fundamentally more expensive because we have to
> be more careful. So mmap latency (delayed page frees) and file delete
> (delayed RCU delete of dentry) ends up having differences of 30-40%.
> But the 10-20% differences are things where the _only_ difference
> really is just the totally uncontended locking.
>
> That's on a P-M. On a P4 it is _worse_, I think.

oh well. The Athlon64 really shines doing atomic ops though, and this
fooled me into thinking that the LOCK prefix has been properly taken
care of forever. What the hell is Intel doing ... atomic ops are so
fundamental to spinlocks. Thank God we can do the movb $1,%0 trick for
spin-unlock.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/