Re: long stalls

From: Brian Tinsley (btinsley@emageon.com)
Date: Tue Jan 07 2003 - 21:16:57 EST


Out of curiosity, which RH kernel are you using? I moved on to 2.4.19
and 2.4.20 primarily because the RH 2.4.18 series of kernels apparently
has a scheduler bug (at least one) that causes the heartbeat software
from Linux-HA to loose heartbeat signals and failover. Not a good
scenario when you are trying to provide HA systems to hospitals!

Russell Leighton wrote:

>
> I can't help, but I can echo a "me too".
>
> We only see it when I have 2 file I/O intensive processes...they both
> will just stop for some few seconds, system seems idle...then
> they just start again. RH7.3 SMP, Dual PIII, 4GB RAM, 3com RAID
> Controller .
>
> Brian Tinsley wrote:
>
>> We have been having terrible problems with long stalls, meaning from
>> a couple of minutes to an hour, happening when filesystem I/O load
>> gets high. The system time as reported by vmstat or sar will increase
>> up to 99% and as it spreads to each procesor, the system becomes
>> completely unresponsive (except that it responds to pings just fine -
>> interesting!). When the system finally returns to the world of the
>> living, the only evidence that something bad has happened is the
>> runtime for kswapd is abnormally high. I have seen this happen with
>> the stock 2.4.17, 2.4.19, and 2.4.20 kernels on SMP PIII and PIV
>> machines (either 4GB or 8GB RAM, all SCSI disks, dual GigE NICs).
>> I've searched the lkml archives and google and have found several
>> similar postings, but there is never an explanation or resolution.
>> Any help would be *very* much appreciated! If any info from the
>> system in question is desired, I will be glad to provide it.
>>
>>
>>
>

-- 

-[========================]- -[ Brian Tinsley ]- -[ Chief Systems Engineer ]- -[ Emageon ]- -[========================]-

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Tue Jan 07 2003 - 22:00:37 EST