Re: scheduling problem?

Ingo Molnar (mingo@chiara.csoma.elte.hu)
Fri, 17 Dec 1999 16:40:52 +0100 (CET)


On Fri, 17 Dec 1999, Andrea Arcangeli wrote:

> On Thu, 16 Dec 1999, Jamie Lokier wrote:
>
> > asm volatile("sti ; hlt" : : : "memory");
> ^^^^^^^^^
>
> The wakeup can still happen between sti and hlt so it doesn't fix the
> irq race condition.

no. Intel guarantees that the _next_ instruction after a 'sti' is still
executing with interrupts turned off. And now we know why ...

so the correct fix is to check need_resched with IRQs turned off and do
the 'sti ; hlt' sequence.

William, does this fix your problems?

> The trivial way to fix that is to loop on the need_resched in the idle
> task (ugh, not nice :( ).

extremely nasty and definitely the oldest known Linux scheduler bug. i've
noticed more than a year ago that by doing need_resched polling in the
idle loop some of the scheduling irregularities go away, but somehow never
got around noticing the real bug ...

polling on need_resched will also increase performance on SMP systems
btw., because the cacheline snoop event goes to the other CPU much faster
than the APIC message - and we can skip sending the APIC message at all if
the other CPU is running an idle thread. OTOH CPUs will run much hotter
...

-- mingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/