Re: [patch] smp-2.3.43-B5, x86 SMP fixes

From: Ingo Molnar (mingo@chiara.csoma.elte.hu)
Date: Tue Feb 08 2000 - 10:57:52 EST


hi Manfred,

On Tue, 8 Feb 2000, Manfred Spraul wrote:

> I have one question:
> http://www.redhat.com/~mingo/smp-2.3.43-B5:
> + send_IPI_mask(cpumask, INVALIDATE_TLB_VECTOR);
> +
> + while (flush_cpumask)
> + /* nothing. lockup detection does not belong here */;
> +
>
> Why?
> * we are running with enabled interrupts, NMI detection won't find the
> lock-up.
> * "stuck on TLB IPI wait" messages are quite common - perhaps as common
> as bad hardware ;). I don't think that we can ignore them.

such lockups can be detected either via pressing RightALT-ScrollLock [in
fact i've 'detected' such a lockup during doing the above fixes]; or via
using the IKD patch's soft lockup detection. (which i originally
implemented to keep exactly such code out of the main kernel)

> * it will be easy to add lockup detection: there are 4 function on
> i386 that wait for inter processor synchronization (wait_on_irq,
> wait_on_bh, smp_call_function, and flush_tlb_others), and I'm
> preparing a helper function that prints a meaningfull back trace
> (better than show()).

no doubt about that, but it just doesnt belong into the main kernel i'm
quite sure. In IKD you can get much more on a lockup than just the
backtrace (you can get an event trace and an 'exact' backtrace).

btw., there still appears to be something weird going on in TLB-flush
land, so not all bugs are fixed yet. Is it absolutely stable on your box
if you untar/tar big filetrees? I got one sporadic failed compression
(which just shuoldnt happen), and a sporadic NULL crash in leave_mm()'s
->active_mm deref. Reverting the pre2-2.3.43 TLB changes solves the
problem. Any ideas?

-- mingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Tue Feb 15 2000 - 21:00:12 EST