tlb flush timeout on SMP alphas

From: Mike (
Date: Tue Dec 10 2002 - 10:06:42 EST

I am trying to figure out if the timeouts I am getting are software or
hardware related. We've got 2 way alphas with 4 gig of ram and the only
other different hardware in them is a Myrinet card with the GM driver.
And I can frequently (not 100% reliably) get tlb flush timeouts while
the system has a load of around 2 and one of the running processes is
using the mryinet card.

When this message occurs, one of the 2 CPUs is "dead", meaning that any
processes running on it are copletly stalled and the rtc for the "dead"
processor in /proc/interupts does not increase.

An interesting datapoint is that the air conditioner failed one weekend
and some machines died because of the heat and these machines failed
with the same tlb flush timeouts.

Currently we are running 2.4.9 (redhat modified) kernel, and have tried
others up to and including 2.4.18 with the same results.

Any insight would be greatly appreciated.

I am not a subscriber on this list, so CCing me would be greatly

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
More majordomo info at
Please read the FAQ at

This archive was generated by hypermail 2b29 : Sun Dec 15 2002 - 22:00:17 EST