Re: HDD problem, software bug, bios bug, or hardware ?

From: Borislav Petkov
Date: Mon Aug 27 2012 - 17:59:53 EST


On Mon, Aug 27, 2012 at 10:01:12AM -0700, Adko Branil wrote:
> >Stupid question: have you tried replacing your DIMMs to see whether this
> >can be caused by a faulty DRAM?
>
> I just tried it - i have 2 banks memory each of 1 Mb - i replaced

You mean 1 Gb each, right?

> the first , then the second, then replaced both and put memory from
> another computer, tried another slots as well - the same picture -
> crashes continuing. There is no visual sign of broken capacitors on
> motherboard - all looks good. When i pass "nosmp" option to kernel
> at boot time, it crashes much faster - may be 100 times faster, and
> the pannic messages are almost the same each-other. Is it for sure
> broken hardware, and which part of hardware it should be in ? Is
> there possibility that it is bad bios, or even virus in bios, should
> upgrading bios help in this cases ?
>
> I have found 2 interesting lines in syslog:
>
> could not find module by name='rtc_cmos'
> microcode: AMD CPU family 0xf not supported

Not relevant.

> The kernel is 3.5.2 on slackware_current with config
> http://pastebin.com/aGqH3tTR , it crashes with older kernels as well.
>

Ok, judging by the oopses this time, most of them are in handle_irq
and "nosmp" disables IO APIC so things start to point at something irq
handling related, if I would have to guess.

Hm, ok, can you rebuild that 3.5.2 kernel with the following options
enabled:

CONFIG_DEBUG_KERNEL
CONFIG_DEBUG_SHIRQ
CONFIG_DEBUG_OBJECTS
CONFIG_DEBUG_PREEMPT
CONFIG_PROVE_LOCKING
CONFIG_DEBUG_BUGVERBOSE
CONFIG_DEBUG_INFO
CONFIG_DEBUG_VM
CONFIG_DEBUG_MEMORY_INIT
CONFIG_DEBUG_LIST
CONFIG_DEBUG_RODATA

Those are all under "Kernel Hacking". Then boot this new kernel and
catch the whole dmesg up and including a couple of oopses and send them
to me.

But do not boot with "nosmp" - we want to see whether default SMP kernel
still triggers.

This should be for now.

Thanks.

--
Regards/Gruss,
Boris.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/