Re: [PATCH] 2.1.88 Hanging Processes (Uninterruptible Sleep)

Linus Torvalds (
Tue, 3 Mar 1998 14:47:48 -0800 (PST)

On Wed, 4 Mar 1998, Gadi Oxman wrote:
> On Tue, 3 Mar 1998, MOLNAR Ingo wrote:
> > it was/is safe on any UP system, and it's also safe on 2.0 SMP.
> I'm not sure -- Linus's theory is:

No, Ingo is right..

> > So you have code that looks like
> >
> > repeat:
> > current->state = TASK_UNINTERRUPTIBLE;
> > if (empty) {
> > schedule();
> > goto repeat;
> > }
> >
> > but is actually executed as:

[ Side note: Please don't send me more email about instruction re-ordering
vs memory bus re-ordering. I do know the difference, and I'd like to
point out that I say "executed" above and that it doesn't really matter
whether it is the memory buffer or the instruction re-ordering that
results in the difference in execution order ]

> > CPU #0 CPU #1
> >
> > repeat:
> > if (empty)
> > empty = 0;
> > tsk->state = TASK_RUNNING;
> >
> > tsk->state = TASK_UNINTERRUPTIBLE;
> > schedule()
> but if the "if (empty)" is indeed re-ordered before the
> "tsk->state = TASK_UNINTERRUPTIBLE", it looks like even on UP, we
> can take an interrupt just at the point in which the above example
> took an interrupt and set "tsk->state = TASK_RUNNING" on CPU #1.

No. The internal logic inside the CPU makes this impossible: the
re-ordering is done speculatively, and the CPU will essentially make sure
that any out-of-order execution is never seen on a single CPU (because
that would seriously break every program out there, and would make
re-starting after an interrupt or exception almost impossible).

If the re-ordering is done by having a store buffer, then it means for
example, that the store buffer will continue to drain despite the
interrupt, so the store would still be done.

And if the re-ordering has been done by the instruction re-ordering
hardware, then the instruction retire phase will make sure that the
speculated instruction will be "killed" - this is something that has to be
done under other circumstances too (for example, if the CPU notices that
the speculated instruction happened to use a memory operand that was
written by earlier instructions that hadn't finished executing yet).

> That interrupt will still set tsk->state = TASK_RUNNING on our single
> CPU before we set the state to TASK_UNINTERRUPTIBLE, and then when we
> return, we will not fall through schedule() and enter an uninterruptible
> sleep.

As I said, that would be really bad, but it cannot happen - this is partly
what makes out-of-order CPU's so hard to do (not because re-ordering is
all that hard, but making sure that the _result_ is correct as if you
hadn't re-ordered is fairly hard. In short - it would be easy to make a
buggy out-of-order machine ;)


To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to