Re: "movb" for spin-unlock

From: Jeff V. Merkey (jmerkey@timpanogas.com)
Date: Thu Apr 27 2000 - 17:53:46 EST


This is how NetWare handles unlock (movb) and it works just fine. I
coded it this way in 1994/1995 after spending about three weeks looking
at bus analyzer traces to make certain there were not many of the issues
Linus was referencing. The performance increase from this is VERY
significant. Asserting LOCK# twice on the same region of memory by
multiple CPUs (hotlocking) is guaranteed to cost you @300 clocks worth
of performance as the pipelines and affected cache lines refill for each
lock/unlock() case.

I don't blame Linus for erring on the side of caution here -- he has to
factor impacts on other platforms besides Intel. Also, with this
optimization, you may see problems with fairness relative to systems
that "mix and match" memory bus chipsets (like tricord used to do). If
you used movb on some of Tricord's earlier SMP Systems, processors were
not guaranteed to be granted the spinlock in a reasonable order,
resulting in processes on one or more processors being endlessly locked
spinning while other processors were always granted the lock first.
With this optimization, if a spinlock is being used a lot, we may need
to implement a "ticketing" algorithmn to ensure fairness in spinlock
ordering, since on some systems, I have seen the movb unlock case cause
fairness problems when several processors are all going after the same
lock at the same time.

:-)

Jeff

Jamie Lokier wrote:
>
> Gérard Roudier wrote:
> > I was not trying to defend Linus, but had questions in mind given than
> > this topic had been discussed in _full_ details about one year ago (or
> > more) and Linus had explained _clearly_ the reasons that let him take his
> > decision at this time.
>
> Linus has changed his mind.
>
> It didn't happen until we restarted this thread, got some new tests done
> and input from the right person at Intel.
>
> A small piece of code that occurs extremely often in the kernel just got
> more than 20 times faster. Apparently it shows up in application
> benchmarks.
>
> But even more importantly:
>
> We understand what's going on now!
>
> The older thread petered out with some loose ends. There were
> conflicting conclusions and misunderstandings.
>
> Now, we understand that the faster code works with all Intel-style SMP
> systems from Pentiums up, but may fail for some 386 or 486 SMP systems.
> And we understand why.
>
> This is complex stuff and the folks writing the most basic code really
> need to understand it. Why, only today, someone came to my office and
> ask for an explanation of memory ordering problems between threads on an
> SMP system. And I was able to refer to this thread. :-)
>
> have a nice day,
> -- Jamie
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.rutgers.edu
> Please read the FAQ at http://www.tux.org/lkml/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sun Apr 30 2000 - 21:00:13 EST