Re: PROBLEM: System Freeze on Particular workload with kernel 2.6.22.6

From: Yucheng Low
Date: Thu Sep 20 2007 - 21:04:25 EST


Hi all,

Thanks all. After lots of testing, I isolated the problem to one of the
memory modules.

Thought it might have been a kernel problem as I thought memtest should
be exhaustive enough considering I ran it for so long, but apparently not...
Even now, the bad module still does not show any errors in memtest...

Thanks,
Yucheng

Ray Lee wrote:
> On 9/19/07, Low Yucheng <ylow@xxxxxxxxxxxxxx> wrote:
>
>> [1.] Summary
>> System Freeze on Particular workload with kernel 2.6.22.6
>>
>> [2.] Description
>> System freezes on repeated application of the following command
>> for f in *png ; do convert -quality 100 $f `basename $f png`jpg; done
>>
>> Problem is consistent and repeatable.
>> Problem persists when running on a different drive, and also in pure console (no X).
>>
>> One time, the following error logged in syslog:
>> Sep 19 04:22:11 mossnew kernel: [ 301.883919] VM: killing process convert
>> Sep 19 04:22:11 mossnew kernel: [ 301.884382] swap_free: Unused swap offset entry 0000ff00
>> Sep 19 04:22:11 mossnew kernel: [ 301.884421] swap_free: Unused swap offset entry 00000300
>> Sep 19 04:22:11 mossnew kernel: [ 301.884456] swap_free: Unused swap offset entry 00000200
>> Sep 19 04:22:11 mossnew kernel: [ 301.884491] swap_free: Unused swap offset entry 0000ff00
>> Sep 19 04:22:11 mossnew kernel: [ 301.884527] swap_free: Unused swap offset entry 0000ff00
>> Sep 19 04:22:11 mossnew kernel: [ 301.884562] swap_free: Unused swap offset entry 00000100
>>
>> Should not be a RAM problem. RAM has survived 12 hrs of Memtest with no errors.
>> Should not be a CPU problem either. I have been running CPU intensive tasks for days.
>>
>
> The "Unused swap offset entry" is almost always a sign of bad memory,
> if google can be trusted. Your workload is *extremely* CPU and memory
> intensive (and even hits the disk!), so this looks like bad RAM, bad
> cooling, or a marginal power supply that is failing under load.
>
> memtest86+ doesn't stress the CPU nearly as much, so it often doesn't
> show all the problems.
>
> Take your RAM down to one stick and try again (looks like you have 2G
> installed?). If that still fails, try different RAM. If that still
> fails, then swap out the power supply for another if you can, and try
> again.
>
> Ray
>
>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/