Re: "movb" for spin-unlock

From: Gérard Roudier (groudier@club-internet.fr)
Date: Thu Apr 27 2000 - 16:53:16 EST


On Thu, 27 Apr 2000, Jamie Lokier wrote:

> Gérard Roudier wrote:
> > I was not trying to defend Linus, but had questions in mind given than
> > this topic had been discussed in _full_ details about one year ago (or
> > more) and Linus had explained _clearly_ the reasons that let him take his
> > decision at this time.
>
> Linus has changed his mind.

No problem seen at all here.

> It didn't happen until we restarted this thread, got some new tests done
> and input from the right person at Intel.
>
> A small piece of code that occurs extremely often in the kernel just got
> more than 20 times faster. Apparently it shows up in application
> benchmarks.
>
> But even more importantly:
>
> We understand what's going on now!

My understanding didn't change. There is no publically available erratum
description that prevents on paper the memory ordering based spin
unlocking from failing on Pentia ans PPro generation processors. This was
already true on the previous thread. Even errata #66 and #92 for early
PPro donnot break it since they just may let a processor read an early
value and the lock prefixed spinlock will be reentered by the loser CPU
and fix the problem.

Now, if you have a look on Intel errata that address cache/memory
inconsistency, you generally read something like that:

Blah blah blah ... a small window exists ... blah blah blah ...
Applications that rely on explicit locked instructions instead of memory
ordering are not affected by this errata.

At this point, we must decide if we want to run the risk of jaming our
finger in all the "small windows" that haven't been discovered yet (or
are just not documented), or not. :-)

> The older thread petered out with some loose ends. There were
> conflicting conclusions and misunderstandings.
>
> Now, we understand that the faster code works with all Intel-style SMP
> systems from Pentiums up, but may fail for some 386 or 486 SMP systems.
> And we understand why.

Bloated applications will eat this performance gain in less than half
the changes of a single update, given the bloating mania that occurs
nowadays, in my opinion.

> This is complex stuff and the folks writing the most basic code really
> need to understand it. Why, only today, someone came to my office and
> ask for an explanation of memory ordering problems between threads on an
> SMP system. And I was able to refer to this thread. :-)

Indeed this is complex. This happens to be at least as complex and
sometime more when having from a driver to deal with a PCI device through
all the variations of bridges that have been invented, and for such
exercise we donnot have a usable LOCK protocol.

Gérard.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sun Apr 30 2000 - 21:00:13 EST