Re: [PATCH] 2.1.88 Hanging Processes (Uninterruptible Sleep)

Gerard Roudier (
Tue, 3 Mar 1998 22:39:39 +0100 (MET)

On Mon, 2 Mar 1998, Linus Torvalds wrote:

> On Tue, 3 Mar 1998, MOLNAR Ingo wrote:
> >
> > solution could be to put the state setting within add_waitqueue and
> > remove_waitqueue operations? Maybe as a third parameter?
> No, that's why you have "sleep()".
> The real problem is that you need to set yourself to sleep a lot more
> often than you need to do an "add_wait_queue()" - so usually this loop is
> run completely without any add_wait_queue() in the loop itself because it
> has been factored to outside the loop.
> But because it has been factored outside the loop, the loop now has fewer
> "synchronization points" - ie atomic isntructions (or cpuid). So now the
> PPro/PentiumII starts to re-order instructions inside the queue a lot more

I would think that the problem is not specific to PPro/PII and that the
good explanation is likely a wrong memory ordering assumption in the
code rather that intruction reordering. However, I agree that instruction
re-ordering increases the probability of memory read/write being reordered.

The 486, Pentium, PPro and PII use write buffers (These buffers are
flushed in order to guarantee some strong ordering when using I/O
The PPro/PentiumII memory ordering is probably so optimized that the problem
may occur more often, but in my opinion, the 486 and Pentium are not
immune of a memory read going ahead buffered memory writes.

So, my thought is that the initial construct that did not force any memory
ordering was only safe for 386 processors.

Anyway, bravo for the fix!

> aggressively, so it might re-order the read of a status variable to be
> before the "current->state = TASK_UNINTERRUPTIBLE" - so now we have the
> _old_ knowledge of the thing being locked, and we're going to sleep on
> that.
> So you have code that looks like
> repeat:
> current->state = TASK_UNINTERRUPTIBLE;
> if (empty) {
> schedule();
> goto repeat;
> }
> but is actually executed as:
> CPU #0 CPU #1
> repeat:
> if (empty)
> empty = 0;
> tsk->state = TASK_RUNNING;
> schedule()
> because CPU #0 has done the (locally legal) optimization of doing the
> empty test before setting the task state.
> The CPUID patch inserts a CPUID instruction in between setting the state
> and testing the test-thing, which due to intel rules force the two to be
> done in the correct order, and then everything works ok.
> Linus


To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to